Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Dynamic Discretisation Support Based on Occupancy Requests #72

Merged
merged 36 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
60be215
Added plot files
alindkhare Nov 14, 2023
8b2f02c
Merge branch 'main' into alind/dynamic_discretization_heuristic
alindkhare Nov 19, 2023
6be5cb2
Added dynamic Discretization Algo
alindkhare Nov 20, 2023
0d0709e
Initial setup for osdi experiments
AdityaAS Nov 23, 2023
88589f1
Investigating non-deterministism with --release_taskgraphs
AdityaAS Nov 24, 2023
e8c0aea
Commit Ray's OSDI experiment ran on 11/23 night
AdityaAS Nov 25, 2023
5dac556
Update run_alibaba_experiments_osdi.sh to use NUM_INVOCATIONS=400 and…
AdityaAS Nov 25, 2023
06e007e
Remove np.clip from get_release_times gamma generation
AdityaAS Nov 25, 2023
50ec59c
Add a "legit fix" for construct strl when --release_taskgraphs is False
AdityaAS Nov 25, 2023
63924e1
Refactor osdi analysis.ipynb with
AdityaAS Nov 25, 2023
4c57618
Clean stuff up before mergin new main in with experiment run time err…
AdityaAS Nov 26, 2023
f34b0cf
Fix STRL generation bug in DAG-Aware TetriSched.
sukritkalra Nov 26, 2023
371c122
Allow generation of TaskGraph DOT representations in dry-run mode.
sukritkalra Nov 26, 2023
e502cdc
Update analysis.ipynb to optionally remove failed experiment folder
AdityaAS Nov 26, 2023
ef93622
Include previously_placed_tasks in non-DAG-aware TetriSched
sukritkalra Nov 27, 2023
50c7ddf
Added dynamic discretization optimization pass
alindkhare Nov 27, 2023
7263583
Modified STRL cutting
alindkhare Nov 28, 2023
cf607f0
- Update OSDI experiment script
AdityaAS Nov 28, 2023
dac7238
Fix gamma policy inter_arrival_times generation using arrival_rate * …
AdityaAS Nov 29, 2023
6c15fdd
Fix release time generation for gamma arrival pattern
AdityaAS Nov 30, 2023
fcb1ee6
Round instead of clip in gamma release time generation
AdityaAS Nov 30, 2023
647c42f
Merged Ray's branch
alindkhare Nov 30, 2023
f673210
Added temporary print statement
alindkhare Nov 30, 2023
b02b334
Make resource utilization plotting work in osdi analysis.ipynb
AdityaAS Nov 30, 2023
9f95d6f
Use POISSON_ARRIVAL_RATES=(0.2 0.5 1 2) in osdi experiment script
AdityaAS Nov 30, 2023
a2b020d
Commented out Opt pass stdcout
alindkhare Nov 30, 2023
b2ea0eb
Added config that throws error in simulator
alindkhare Nov 30, 2023
a353f0b
Try increasing max deadline var and decreasing arrival rate to achiev…
AdityaAS Nov 30, 2023
710b5f0
Fix a logging error in simulator __handle_task_cancellation
AdityaAS Nov 30, 2023
47ae583
Add analysis_util.py for osdi experiment
AdityaAS Nov 30, 2023
ff8bf20
Update analysis.ipynb based on newly implemented analysis_utils.py
AdityaAS Nov 30, 2023
2fe3dcf
Fix floating point comparisons in Gurobi
sukritkalra Nov 30, 2023
e6867b1
Merge branch 'ray-osdi-experiment' into alind/experiments_alibaba
alindkhare Dec 1, 2023
02d5b28
Fixed capacity constraint map registration of usgae
alindkhare Dec 1, 2023
7ffc94b
Added correct way to calculate occupancy map and occupancy threshold …
alindkhare Dec 3, 2023
9213517
Merge branch 'main' into alind/experiments_alibaba
sukritkalra Dec 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,5 @@ env

# Ignore build of tetrisched
schedulers/tetrisched/build/*

experiments/*
20 changes: 20 additions & 0 deletions configs/alibaba_edf_fixed.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
--log_file_name=./alibaba_scheduler_EDF_num_invocation_50.log
--csv_file_name=./alibaba_scheduler_EDF_num_invocation_50.csv
--log_level=debug
--execution_mode=replay
--replay_trace=alibaba
--max_deadline_variance=100
--min_deadline_variance=50
--workload_profile_path=./traces/alibaba-cluster-trace-v2018/alibaba_random_50_dags.pkl
--override_num_invocations=50
--override_arrival_period=10
--randomize_start_time_max=100
--worker_profile_path=profiles/workers/alibaba_cluster.yaml
--scheduler_runtime=0
--scheduler=EDF
# --enforce_deadlines
# --retract_schedules
# --drop_skipped_tasks
# # --release_taskgraphs
# --scheduler_log_times=10
# --scheduler_time_discretization=1
2 changes: 1 addition & 1 deletion configs/alibaba_tetrisched_adaptive_discretization.conf
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
--max_deadline_variance=100
--min_deadline_variance=50
--workload_profile_path=./traces/alibaba-cluster-trace-v2018/alibaba_random_50_dags.pkl
--override_num_invocations=1
--override_num_invocations=50
--override_arrival_period=10
--randomize_start_time_max=100
--worker_profile_path=profiles/workers/alibaba_cluster.yaml
Expand Down
9 changes: 7 additions & 2 deletions configs/alibaba_tetrisched_discrete_1.conf
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,20 @@
--max_deadline_variance=100
--min_deadline_variance=50
--workload_profile_path=./traces/alibaba-cluster-trace-v2018/alibaba_random_50_dags.pkl
--override_num_invocations=1
--override_num_invocations=50
--override_arrival_period=10
--randomize_start_time_max=100
--worker_profile_path=profiles/workers/alibaba_cluster.yaml
--scheduler_runtime=0
--scheduler=TetriSched
# --scheduler=EDF
--enforce_deadlines
--retract_schedules
--drop_skipped_tasks
--release_taskgraphs
--scheduler_log_times=10
--scheduler_time_discretization=1
--scheduler_time_discretization=5

# --override_release_policy=gamma
# --override_poisson_arrival_rate=10
# --override_gamma_coefficient=3
20 changes: 20 additions & 0 deletions configs/alibaba_tetrisched_discrete_1_no_dag_awareness.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
--log_file_name=./alibaba_scheduler_TetriSched_release_policy_fixed_deadline_var_100_scheduler_discretization_1_no_dag_awareness.log
--csv_file_name=./alibaba_scheduler_TetriSched_release_policy_fixed_deadline_var_100_scheduler_discretization_1_no_dag_awareness.csv
--log_level=debug
--execution_mode=replay
--replay_trace=alibaba
--max_deadline_variance=100
--min_deadline_variance=50
--workload_profile_path=./traces/alibaba-cluster-trace-v2018/alibaba_random_50_dags.pkl
--override_num_invocations=50
--override_arrival_period=10
--randomize_start_time_max=100
--worker_profile_path=profiles/workers/alibaba_cluster.yaml
--scheduler_runtime=0
--scheduler=TetriSched
--enforce_deadlines
--retract_schedules
--drop_skipped_tasks
# --release_taskgraphs
--scheduler_log_times=10
--scheduler_time_discretization=1
23 changes: 23 additions & 0 deletions configs/alibaba_tetrisched_dynamic_discretization_1_5.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
--log_file_name=./alibaba_scheduler_TetriSched_release_policy_fixed_deadline_var_100_scheduler_dynamic_discretization_1_5_auto_occupancy_0.7.log
--csv_file_name=./alibaba_scheduler_TetriSched_release_policy_fixed_deadline_var_100_scheduler_dynamic_discretization_1_5_auto_occupancy_0.7.csv
--log_level=debug
--execution_mode=replay
--replay_trace=alibaba
--max_deadline_variance=100
--min_deadline_variance=50
--workload_profile_path=./traces/alibaba-cluster-trace-v2018/alibaba_random_50_dags.pkl
--override_num_invocations=50
--override_arrival_period=10
--randomize_start_time_max=100
--worker_profile_path=profiles/workers/alibaba_cluster.yaml
--scheduler_runtime=0
--scheduler=TetriSched
--enforce_deadlines
--retract_schedules
--drop_skipped_tasks
--release_taskgraphs
--scheduler_log_times=10
--scheduler_time_discretization=1
--scheduler_dynamic_discretization
--scheduler_max_time_discretization=5
--scheduler_max_occupancy_threshold=0.7
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
--log_file_name=./alibaba_scheduler_TetriSched_release_policy_fixed_deadline_var_100_scheduler_dynamic_discretization_1_5_max_occ_1100.log
--csv_file_name=./alibaba_scheduler_TetriSched_release_policy_fixed_deadline_var_100_scheduler_dynamic_discretization_1_5_max_occ_1100.csv
--log_level=debug
--execution_mode=replay
--replay_trace=alibaba
--max_deadline_variance=100
--min_deadline_variance=50
--workload_profile_path=./traces/alibaba-cluster-trace-v2018/alibaba_random_50_dags.pkl
--override_num_invocations=50
--override_arrival_period=10
--randomize_start_time_max=100
--worker_profile_path=profiles/workers/alibaba_cluster.yaml
--scheduler_runtime=0
--scheduler=TetriSched
--enforce_deadlines
--retract_schedules
--drop_skipped_tasks
--release_taskgraphs
--scheduler_log_times=10
--scheduler_time_discretization=1
--scheduler_dynamic_discretization
--scheduler_max_time_discretization=5
--scheduler_max_occupancy_threshold=1100
2 changes: 1 addition & 1 deletion configs/alibaba_trace.conf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
--execution_mode=replay
--replay_trace=alibaba
--workload_profile_path=./traces/alibaba-cluster-trace-v2018/alibaba_random_50_dags.pkl
--batch_size_job_loading=25
# --batch_size_job_loading=25
--override_num_invocations=1
--override_arrival_period=10
--randomize_start_time_max=50
Expand Down
1 change: 1 addition & 0 deletions data/alibaba_loader.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import json
import math
import os
import pathlib
Expand Down
2 changes: 1 addition & 1 deletion data/csv_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,11 @@ def parse_events(self, readings: Mapping[str, Sequence[str]]):
schedulers = []
for reading in csv_readings:
try:
# TODO: This
if reading[1] == "SIMULATOR_START":
simulator = Simulator(
csv_path=csv_path,
start_time=int(reading[0]),
total_tasks=reading[2],
)
elif reading[1] == "UPDATE_WORKLOAD":
simulator.total_tasks += int(reading[2])
Expand Down
622,740 changes: 622,740 additions & 0 deletions experiments/analysis.ipynb

Large diffs are not rendered by default.

173 changes: 173 additions & 0 deletions experiments/analysis_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
import os
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

def calculate_arrival_rate_and_cv2(release_time: list[int]):
release_time.sort()
inter_arrival_times = np.diff(release_time)
avg_inter_arrival_time = np.mean(inter_arrival_times)
std_inter_arrival_time = np.std(inter_arrival_times)
cv2 = (std_inter_arrival_time/avg_inter_arrival_time) ** 2
return 1/avg_inter_arrival_time, cv2

def find_all_file_paths(path, ends_with=".csv"):
csv_file_paths = []
if os.path.isdir(path):
for filename in os.listdir(path):
if filename.endswith(ends_with):
csv_file_paths.append(os.path.join(path, filename))
else:
csv_file_paths += find_all_file_paths(os.path.join(path, filename), ends_with)
return csv_file_paths

def extract_variables_from_filename(filename):
# Split the filename by underscores
parts = filename.split('_')

# Extract the variables based on your format
replay_trace = parts[0]
scheduler = parts[2]
release_policy = parts[5]
deadline_var = int(parts[9])
dag_aware = parts[12] == "1"

try:
arrival_rate = float(parts[16])
cv2 = int(parts[19].split('.')[0]) # Assuming the file extension is .csv
except:
# Before 11/28 afternoon, I used a different format for the filename and didn't include the arrival rate and CV2
arrival_rate = 10
cv2 = 2

if scheduler == "TetriSched":
scheduler_time_discretization = int(parts[-1].split('.')[0])
scheduler = f"TetriSched_time_dis_{scheduler_time_discretization}" + ("_DAG_aware" if dag_aware else "")
else:
scheduler_time_discretization = None

# Create a dictionary to store the extracted variables
variables = {
'trace': replay_trace,
'release_policy': release_policy,
'max_deadline_variance': deadline_var,
'scheduler': scheduler,
'DAG_aware': dag_aware,
'scheduler_time_discretization': scheduler_time_discretization,
"arrival_rate": arrival_rate,
"cv2": cv2,
}

return variables


def extract_experiments_result(base_dir: str) -> pd.DataFrame:
rows = []
# Loop through each folder and process the CSV file
for csv_file_path in find_all_file_paths(base_dir):
file_name = csv_file_path.split(os.sep)[-1]
try:
# Open the CSV file and read the last line
with open(csv_file_path, 'r') as file:
lines = file.readlines()
last_line = lines[-1]

end_time, _, finished_tasks, cancelled_tasks, missed_task_deadlines, finished_task_graphs, cancelled_task_graphs, missed_task_graph_deadlines = last_line.split(",")
row = extract_variables_from_filename(file_name)
# Analyze SLO attainment and goodput
slo_attainment = (int(finished_task_graphs) - int(missed_task_graph_deadlines)) / (int(cancelled_task_graphs) + int(finished_task_graphs))
row["slo_attainment"] = slo_attainment
row["goodput"] = int(finished_tasks)
row["csv_file_path"] = csv_file_path

# Calculate the arrival rate and cv2
release_times = []
for line in lines:
if "TASK_RELEASE" not in line:
continue
# event_time should be the actual release time
event_time, _, task_name, _, task_intended_release_time, task_release_time, task_deadline, task_id, task_graph = line.strip().split(",")
release_times.append(int(task_release_time))

actual_arrival_rate, actual_cv2 = calculate_arrival_rate_and_cv2(release_times)
row["actual_arrival_rate"] = actual_arrival_rate
row["actual_cv2"] = actual_cv2

rows.append(row)
except FileNotFoundError:
print(f"File not found: {csv_file_path}")
except Exception as e:
print(f"An error occurred while processing {csv_file_path}: {str(e)}")
# I want to remove the parent folder of the CSV file
# print(f"Removing {os.path.dirname(csv_file_path)}")
# shutil.rmtree(os.path.dirname(csv_file_path))

return pd.DataFrame(rows)


def plot_slo_attainments(data: pd.DataFrame):
# Define your unique values for the grid
cv2_values = sorted(data["cv2"].unique())
arrival_rate_values = sorted(data["arrival_rate"].unique())
scheduler_values = ["TetriSched_time_dis_20", "TetriSched_time_dis_20_DAG_aware", "TetriSched_time_dis_10",
"TetriSched_time_dis_10_DAG_aware", "TetriSched_time_dis_1", "TetriSched_time_dis_1_DAG_aware", "EDF"]

# Number of schedulers
n_schedulers = len(scheduler_values)

# Create a subplot grid
fig, axes = plt.subplots(len(arrival_rate_values), len(cv2_values), figsize=(20, 15), sharey=True)

# Define the width of each bar and the spacing between them
bar_width = 0.20
spacing = 0.05
group_width_factor = 2 # Increase this factor to widen the distance between groups

# Collect handles and labels for the legend
handles, labels = [], []

# Iterate over each subplot and plot the data
for i, arrival_rate in enumerate(arrival_rate_values):
for j, cv2 in enumerate(cv2_values):
ax = axes[i][j]
subset = data[(data['arrival_rate'] == arrival_rate) & (data['cv2'] == cv2)]

# Get unique deadline variances
deadline_vars = sorted(subset['max_deadline_variance'].unique())
x = np.arange(len(deadline_vars)) * group_width_factor # Adjust x positions

for k, scheduler in enumerate(scheduler_values):
scheduler_data = subset[subset['scheduler'] == scheduler]
# Calculate the position of each bar
bar_positions = x - (n_schedulers * bar_width / 2) + (k * bar_width) + (spacing * k)
# Some bars may not exist for some schedulers
slo_attainments = []
for deadline_var in deadline_vars:
if len(scheduler_data[scheduler_data['max_deadline_variance'] == deadline_var]['slo_attainment']) == 0:
slo_attainments.append(0)
else:
slo_attainments.append(scheduler_data[scheduler_data['max_deadline_variance'] == deadline_var]['slo_attainment'].item())

ax.bar(bar_positions, slo_attainments, width=bar_width, label=scheduler)

for c in ax.containers:
labels = [f'{(v.get_height() * 100):.1f}' for v in c]
ax.bar_label(c, labels=labels, label_type='edge', rotation=45, size=8)

ax.set_xticks(x)
ax.set_xticklabels(deadline_vars)
ax.set_title(f"Arrival Rate: {subset['actual_arrival_rate'].mean():.2f}, CV2: {subset['actual_cv2'].mean():.2f}")
ax.set_xlabel('Max Deadline Variance')
ax.set_ylabel('SLO Attainment')

# Adjust layout and add a super title
plt.tight_layout()
plt.subplots_adjust(top=0.9) # Adjust the bottom parameter to make space for the legend

handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='upper center', bbox_to_anchor=(0.5, 0.95), ncol=len(labels))

plt.suptitle('SLO Attainment Comparison (min_deadline_var=10, num_invocation=400) 11_29_2023', size=16)

# Show the plot
plt.show()
14 changes: 14 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,12 +242,24 @@
"If `True`, the scheduler creates space-time matrix non-uniformly. "
"The discretization is finer initially, and coarser at the end. (default: False)",
)
flags.DEFINE_bool(
"scheduler_dynamic_discretization",
False,
"If `True`, the scheduler creates space-time matrix non-uniformly. "
"The discretization is dynamically decided based on the occupancy request for each time slice. (default: False)",
)
flags.DEFINE_integer(
"scheduler_max_time_discretization",
5,
"The maximum discretization that the scheduler can have (in µs). "
"Only used when scheduler_adaptive_discretization flag is enabled. (default: 5)",
)
flags.DEFINE_float(
"scheduler_max_occupancy_threshold",
0.8,
"The percentage b/w 0 and 1 of maximum occupancy beyond which the discretization would always be 1 incase of dynamic discretization. "
"This flag is only used when dynamic discretization is enabled (default: 0.8)",
)
flags.DEFINE_integer(
"scheduler_delay",
0,
Expand Down Expand Up @@ -623,6 +635,8 @@ def main(args):
max_time_discretization=EventTime(
FLAGS.scheduler_max_time_discretization, EventTime.Unit.US
),
dynamic_discretization=FLAGS.scheduler_dynamic_discretization,
max_occupancy_threshold=FLAGS.scheduler_max_occupancy_threshold,
)
else:
raise ValueError(
Expand Down
Loading