Pkg.add("Plots")
Pkg.add("Lint")
Pkg.add("Gadfly")
Pkg.add("ProfileView")
Pkg.add("CSV")
Pkg.add("StatsBase")
Pkg.add("StatsModels")
Pkg.add("GLM")
Pkg.add("RDatasets")
Pkg.add("IterTools")
Pkg.add("Missings")
Pkg.add("RCall")
Pkg.add("DataFrames")
Pkg.update()
Installing R dependencies:
install.packages(c("ggplot2", "dplyr", "tidyr", "rjson", "GGally",
"plotly", "rPref", "pracma", "FrF2", "AlgDesign",
"quantreg"),
repos = "https://mirror.ibcp.fr/pub/CRAN/")
Cloning and updating the legup-tuner
repository:
git clone https://github.com/phrb/legup-tuner.git || (cd legup-tuner && git pull)
Export your path to repository_dir
variable:
pwd | tr -d "\n"
git clone https://github.com/phrb/autotuning_screening_experiment.git || (cd autotuning_screening_experiment && git pull)
We are also interested in exploring new design construction metrics, in order to understand which factors are best estimated at each step and possibly discard designs not capable of estimating parameters of interest.
We chose to start with the bicg
SPAPT kernel. It was one of the kernels where
our approach achieved similar-performing solutions using less experiments, in
comparison with a uniform random sampling approach, across 10 repetitions of an
experiment with a fixed budget of 400 experiments and a limit of 4 iterations,
or steps, of our approach.
The following sections describe our ongoing exploration of the performance
modeling and optimization of the bicg
kernel.
Figure fig:best-found shows the iterations where the best configuration of each of the 10 runs was found, for uniform sampling and for our approach.
The results are the same as the ones presented in the paper, but the data are not the same. The data for the figures in this report were obtained with a new set of 10 repetitions the same experiment, but using a performance model with only linear and quadratic terms, removing the cubic terms used in the paper’s experiments.
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 4, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
#geom_label_repel(data = . %>% group_by(application) %>%
# filter(technique == "RS") %>%
# filter(best_iteration == min(best_iteration)),
# aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
#scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_size_manual("", values = 0.9) +
#annotation_logticks(sides = "b") +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 30) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
#text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
#scale_color_brewer(palette = "Set1")
scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 30) +
theme(text = element_text(family = "serif"),
legend.position = c(0.2, 0.5),
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
mixed_plot_data = it_data %>%
mutate(technique = gsub(pattern = "DLMT",
replacement = "No Cubic Terms",
x = technique, fixed = TRUE))
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
The results are very similar from the ones in the paper, but our approach found slightly better configurations slightly faster. Random sampling also found a much better configuration much faster than before in one experiment.
Figure fig:eliminated-terms shows the count of model terms that were eliminated
at each color-coded DLMT step. As before, we see that OMP
and SCR
were the most
eliminated factors, especially on the first step.
base_size <- 25
ggplot(subset(eliminated_terms, removed_variables != ""),
aes(x = removed_variables,
fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = c(0.8, 0.95),
legend.background = element_blank(),
legend.direction = "horizontal",
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(name = "Step", palette = "Set2")
Figure fig:encpnc-ff-ss-be shows the factors that were identified as
significant by ANOVA, with significance threshold of
It is interesting that only parasilo-19, parasilo-20, and parasilo-9 eliminated
any factor other than OMP
and SCR
on the first step where any factor was
eliminated, and also that those runs fixed the factor RT_I
to the same value.
We can also see that OMP
seems to be the parameter behind the most extreme
changes in the execution time of tested configurations. This is specially clear
at parasilo-13 and parasilo-10, where the explored configurations have
relatively high execution time until after step 3, where OMP
is fixed.
A very slight worsening of execution times, an increase, that is, can be seen at
the fourth step at parasilo-11, after RT_I
was fixed. The minimum execution time
seems to be higher than in the third step.
./img/explored_space_no_cubic.pdf
/tmp/babel-cKpmYZ/figure1ZhJ27.pdf
./img/models_fixed_omp_no_cubic.pdf
/tmp/babel-cKpmYZ/figurePIKjIp.pdf
/tmp/babel-Bco3bb/figureva5tXz.pdf
Figure fig:nc-predicted-best compares the performance predicted by the fitted model, represented by the green line, with the performance of the best point found at each design, represented by the blue line. The red line marks the best point found so far.
./img/predictions_no_cubic.pdf
It is only at the fourth step of parasilo-17 that the point with the best predicted performance was better than the best point on the design for that step, while also being better than the best point found so far. Although we seem to be effectively restricting the search space with our exploration, which is evidenced by the improvement that occurs as steps progress and by the best points being found inside designs, the models fit using experiment data are not able to improve the current best on the majority of cases.
2.1.1.1.2.5.1 D-Criterion/tmp/babel-w5lAu8/figurey8nlJc.pdf
Since OMP
was quickly fixed at most iterations of the previous experiment, and
had the largest impact on performance we could observe, we decided to fix it,
along with all other binary parameters.
In the experiments shown on this section, all binary parameters are turned on by default, including on the random sampling experiments. Figure fig:nobin-best-found shows the iterations where the best configuration of each of the 10 runs was found, for uniform sampling and for our approach. We see that random sampling found an extremely fast point right at the second tested configuration.
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(subset(it_data, technique == "DLMT")$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 2, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_size_manual("", values = 0.45) +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 17) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 17) +
theme(text = element_text(family = "serif"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
Even after fixing binary parameters, or maybe because of it, some parameters
were still identified as significant and fixed. The T1_I
parameter was only the
fixed at two iterations on the previous experiment, but it was the most fixed
parameter in this one, surpassing RT_I
, which was frequently eliminated in both
experiments. This suggests that fixing binary parameters allowed the
significance of the other parameters to be detected.
base_size <- 18
ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(palette = "Set1")
We see that parasilo-13 and parasilo-23 did not eliminate any parameters after 4
steps. We also see that, at parasilo-12, parasilo-20, and parasilo-25, for
example, the search space seems to be restricted to a region with worse
performance than in the previous steps of the same experiment. This could be due
to the fact that these experiments fixed the RT_I
parameter to its fifth level.
A slight improvement of the configurations tested can be seen at the other
experiments, where other factors were fixed.
Identifying when a restriction of the search space to a worse region happens would be useful if we could revert or re-do an optimization step. This is an interesting motivation for keeping a human on the loop of the optimization process.
/tmp/babel-Bco3bb/figuremYRSBQ.pdf
/tmp/babel-Bco3bb/figureaYZZxh.pdf
/tmp/babel-Bco3bb/figure98HYKr.pdf
/tmp/babel-Bco3bb/figurer5tVWU.pdf
2.1.1.1.3.4.1 Performance at Each Step/tmp/babel-w5lAu8/figurey8nlJc.pdf
2.1.1.1.3.4.2 D-Criterion/tmp/babel-w5lAu8/figurey8nlJc.pdf
Significant changes were performed on the initial DLMT implementation. We
decided that there was no good reason to not reuse the data obtained from
evaluating designs at each step, and the various samples of the search space
taken at different points. Now, all evaluated experiments compose a single
growing design used by aov
to identify the best factors, and all samples from
the search space compose a single data set used by optFederov
to select
experiments. The data set is pruned in both aov
and lm
analyses, to guarantee
only experiments with the correct levels of fixed factors are used. This is
crucial for both analyses, since having different fixed values of factors that
are not in the model would imply that we would have inexplicable variance in the
data set.
Using all experimental data on aov
is interesting because it is always worth it
to consider additional information on the factors that are currently being
studied. On one hand, it might not allow for enough “flexibility” when we
consider regression only on a small restricted subspace, because points outside
the subspace would impact regression, and we would be interested in the
significance of the factor inside the subspace at the moment. On the other hand,
using all data available makes sense because we are also interested in exploring
any global structures of the search space, and keeping points from other
subspaces would increase the chance of “catching” the significance of a factor
globally.
Using all sampled space, across all steps, as a candidate for optFederov
has no
downsides, provided we prune the search space to account for current constraints
on factor levels fixed on previous steps. We increase the size of the available
set of configurations that can compose a new design each time we sample a new
subspace. This would hopefully improve the quality of designs produced as we
progress.
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 4, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
#geom_label_repel(data = . %>% group_by(application) %>%
# filter(technique == "RS") %>%
# filter(best_iteration == min(best_iteration)),
# aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
#scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_size_manual("", values = 0.9) +
#annotation_logticks(sides = "b") +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 30) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
#text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
#scale_color_brewer(palette = "Set1")
scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 30) +
theme(text = element_text(family = "serif"),
legend.position = c(0.2, 0.5),
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
mixed_plot_data = bind_rows(mixed_plot_data,
it_data %>%
mutate(technique = gsub(pattern = "DLMT",
replacement = "Reusing Designs",
x = technique, fixed = TRUE)) %>%
filter(technique != "RS"))
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
base_size <- 25
ggplot(subset(eliminated_terms, removed_variables != ""),
aes(x = removed_variables,
fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = c(0.8, 0.95),
legend.background = element_blank(),
legend.direction = "horizontal",
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(name = "Step", palette = "Set2")
./img/explored_space_reuse.pdf
/tmp/babel-X0G35y/figureSHJRg0.pdf
/tmp/babel-X0G35y/figureG7DSgI.pdf
./img/models_fixed_omp_reuse.pdf
2.1.1.1.4.5.1 D-Criterion/tmp/babel-X0G35y/figurekA5QNA.pdf
This experiment uses an initial model with linear and quadratic terms for all
factors. All binary parameters were fixed to its False
value. We hoped to
determine if the effects of binary factors, which are proportionally much larger
than the effects of the other factors, were preventing the detection of the
significance of the other factors.
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 2, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
#geom_label_repel(data = . %>% group_by(application) %>%
# filter(technique == "RS") %>%
# filter(best_iteration == min(best_iteration)),
# aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
#scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_size_manual("", values = 0.45) +
#annotation_logticks(sides = "b") +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 17) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
#scale_color_brewer(palette = "Set1")
scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 17) +
theme(text = element_text(family = "serif"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
base_size <- 18
ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(palette = "Set1")
/tmp/babel-Bco3bb/figure2m33md.pdf
/tmp/babel-Bco3bb/figureLgLJS8.pdf
/tmp/babel-Bco3bb/figureAw53CA.pdf
/tmp/babel-Bco3bb/figureDkdM8P.pdf
/tmp/babel-Bco3bb/figure21sY97.pdf
2.1.1.1.5.5.1 D-Criterion/tmp/babel-Bco3bb/figureqMGCjQ.pdf
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 2, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
#geom_label_repel(data = . %>% group_by(application) %>%
# filter(technique == "RS") %>%
# filter(best_iteration == min(best_iteration)),
# aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
#scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_size_manual("", values = 0.45) +
#annotation_logticks(sides = "b") +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 17) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
#scale_color_brewer(palette = "Set1")
scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 17) +
theme(text = element_text(family = "serif"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
mixed_plot_data = bind_rows(mixed_plot_data,
it_data %>%
mutate(technique = gsub(pattern = "DLMT",
replacement = "Quantile Regression (tau = 0.05)",
x = technique, fixed = TRUE)) %>%
filter(technique != "RS"))
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
base_size <- 18
ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(palette = "Set1")
/tmp/babel-Bco3bb/figureXncsjg.pdf
/tmp/babel-Bco3bb/figuresjumNZ.pdf
/tmp/babel-Bco3bb/figureeG8fKl.pdf
/tmp/babel-Bco3bb/figureDkdM8P.pdf
/tmp/babel-Bco3bb/figureWM6DrI.pdf
2.1.1.1.6.5.1 D-Criterion/tmp/babel-Bco3bb/figureqMGCjQ.pdf
Figure fig:nobin-best-found shows the iterations where the best configuration of each of the 10 runs was found, for uniform sampling and for our approach.
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(subset(it_data, technique == "DLMT")$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 2, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_size_manual("", values = 0.45) +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 17) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 17) +
theme(text = element_text(family = "serif"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
mixed_plot_data = bind_rows(mixed_plot_data,
it_data %>%
mutate(technique = gsub(pattern = "DLMT",
replacement = "Running 8 Steps",
x = technique, fixed = TRUE)) %>%
filter(technique != "RS"))
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
base_size <- 18
ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(palette = "Set1")
/tmp/babel-w5lAu8/figurey8nlJc.pdf
2.1.1.1.7.4.2 D-Criterion/tmp/babel-w5lAu8/figurey8nlJc.pdf
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor",
"gesummv", "dgemv3", "lu", "mvt", "seidel",
"jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor",
"gesummv", "dgemv3", "lu", "mvt",
"seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm",
"tensor", "gesummv", "dgemv3", "lu", "mvt",
"seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"dgemv", "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 4, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
#geom_label_repel(data = . %>% group_by(application) %>%
# filter(technique == "RS") %>%
# filter(best_iteration == min(best_iteration)),
# aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
geom_vline(aes(xintercept = mean_cost_baseline,
size = "-O3"),
linetype = 8,
stat = "unique",
color = "black") +
#scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
scale_y_continuous(limits = c(-20, 400),
breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_x_continuous(limits = c(0, 2.5),
breaks = seq(0, 2.5, length.out = 5)) +
scale_size_manual("", values = 0.9) +
#annotation_logticks(sides = "b") +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 30) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
#text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
scale_color_brewer(palette = "Set1")
#scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 30) +
theme(text = element_text(family = "serif"),
legend.position = c(0.2, 0.5),
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
# mixed_plot_data = bind_rows(mixed_plot_data,
# it_data %>%
# mutate(technique = gsub(pattern = "DLMT",
# replacement = "Reusing Designs",
# x = technique, fixed = TRUE)) %>%
# filter(technique != "RS"))
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
base_size <- 25
ggplot(subset(eliminated_terms, removed_variables != ""),
aes(x = removed_variables,
fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = c(0.8, 0.95),
legend.background = element_blank(),
legend.direction = "horizontal",
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(name = "Step", palette = "Set2")
./img/explored_space_reuse.pdf
/tmp/babel-X0G35y/figureSHJRg0.pdf
/tmp/babel-X0G35y/figureG7DSgI.pdf
./img/models_fixed_omp_reuse.pdf
2.1.1.1.8.5.1 D-Criterion/tmp/babel-X0G35y/figurekA5QNA.pdf
library(ggplot2)
library(dplyr)
library(latex2exp)
library(patchwork)
library(paletteer)
plot_df = read.csv("data/complete_bicgkernel_data.csv") %>%
mutate(technique = factor(technique,
levels = c("RS",
"Original",
"No Cubic Terms",
"Reusing Designs",
"Running 8 Steps",
"Quantile Regression (tau = 0.05)"),
labels = c("RS",
"Original",
"No Cubic",
"Reuse Data",
"Extra Steps",
"QR (5%)")))
p1 = ggplot() +
geom_jitter(data = plot_df,
aes(y = min_run_cost,
x = technique,
color = best_iteration),
stat = "unique",
width = 0.2,
height = 0.0,
size = 4) +
geom_hline(data = plot_df,
aes(yintercept = min(mean_cost_baseline)),
stat = "unique",
size = 1.1,
linetype = 2,
color = "black") +
geom_text(data = plot_df,
aes(x = 0.5,
y = min(mean_cost_baseline) - 0.14),
size = 10,
stat = "unique",
label = "Baseline (-O3)",
hjust = 0,
color = "black") +
ylab("Execution Time (s)") +
scale_color_paletteer_c(name = "Iteration Found",
n.breaks = 6,
direction = -1,
palette = "oompaBase::greenscale") +
theme_bw(base_size = 33) +
theme(legend.position = c(0.7, 0.86),
legend.direction = "horizontal",
legend.title = element_text(size = 25),
legend.key.width = unit(2, "cm"),
axis.title.x = element_blank(),
legend.background = element_blank())
p2 = ggplot() +
geom_jitter(data = plot_df,
aes(y = best_iteration,
x = technique,
color = min_run_cost),
stat = "unique",
width = 0.2,
height = 0.0,
size = 4) +
ylab("Iteration Finding the Best Point") +
scale_color_paletteer_c(name = "Execution Time",
n.breaks = 6,
direction = -1,
palette = "oompaBase::greenscale") +
theme_bw(base_size = 33) +
theme(legend.position = c(0.7, 0.9),
legend.direction = "horizontal",
legend.title = element_text(size = 25),
legend.key.width = unit(2, "cm"),
axis.title.x = element_blank(),
legend.background = element_blank())
p1 / p2
library(grid)
library(gtable)
library(ggrepel)
library(utf8)
it_data <- complete_plot_data
it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
"lu", "mvt", "seidel", "jacobi"))
it_data$header <- rep("0", nrow(it_data))
it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"
it_data$header <- factor(it_data$header, levels = c("C"))
levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
"[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")
it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]
p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
facet_wrap(facet ~ ., ncol = 4) +
geom_point(size = 2, pch = 19) +
stat_ellipse(type = "t", linetype = 13) +
#geom_label_repel(data = . %>% group_by(application) %>%
# filter(technique == "RS") %>%
# filter(best_iteration == min(best_iteration)),
# aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
#scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
scale_size_manual("", values = 0.45) +
#annotation_logticks(sides = "b") +
ggtitle("") +
ylab("Iteration where Best was Found") +
xlab("Best Cost in Seconds") +
guides(color = guide_legend(reverse = TRUE)) +
theme_bw(base_size = 17) +
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
text = element_text(family = "serif"),
strip.background = element_rect(fill = "white"),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
#scale_color_brewer(palette = "Set1")
scale_color_grey(start = 0.3, end = 0.7)
dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
facet_wrap(facet ~ ., scale = "free", ncol = 4) +
geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
ymin = -Inf, ymax = Inf) +
theme_minimal(base_size = 17) +
theme(text = element_text(family = "serif"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
) +
scale_fill_brewer(palette = "Pastel2", direction = -1)
#scale_fill_grey()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)
gtable_select <- function (x, ...)
{
matches <- c(...)
x$layout <- x$layout[matches, , drop = FALSE]
x$grobs <- x$grobs[matches]
x
}
panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1
new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)
gtable_stack <- function(g1, g2){
g1$grobs <- c(g1$grobs, g2$grobs)
g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
g1$layout <- rbind(g1$layout, g2$layout)
g1
}
new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
The results are very similar from the ones in the paper, but our approach found slightly better configurations slightly faster. Random sampling also found a much better configuration much faster than before in one experiment.
2.1.1.1.9.0.2 Eliminated Factors and Fixed Values Figure fig:eliminated-terms shows the count of model terms that were eliminated at each color-coded DLMT step. As before, we see thatOMP
and SCR
were the most
eliminated factors, especially on the first step.
base_size <- 18
ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
geom_bar(stat = "Count", show.legend = TRUE) +
ylab("Eliminated Terms") +
xlab("") +
scale_x_discrete() +
theme_bw(base_size = base_size) +
theme(text = element_text(family = "sans"),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
guides(fill = guide_legend(nrow = 1)) +
scale_fill_brewer(palette = "Set1")
/tmp/babel-Bco3bb/figureH9Y6En.pdf
/tmp/babel-Bco3bb/figureCf51rF.pdf
/tmp/babel-elHoq9/figure3xnWMZ.pdf
/tmp/babel-Bco3bb/figureqNvfdJ.pdf
/tmp/babel-Bco3bb/figureczY2cs.pdf
/tmp/babel-Bco3bb/figureva5tXz.pdf
2.1.1.1.9.0.4 Predicted Performance at Each Step Figure fig:nc-predicted-best compares the performance predicted by the fitted model, represented by the green line, with the performance of the best point found at each design, represented by the blue line. The red line marks the best point found so far./tmp/babel-w5lAu8/figurey8nlJc.pdf
It is only at the fourth step of parasilo-17 that the point with the best predicted performance was better than the best point on the design for that step, while also being better than the best point found so far. Although we seem to be effectively restricting the search space with our exploration, which is evidenced by the improvement that occurs as steps progress and by the best points being found inside designs, the models fit using experiment data are not able to improve the current best on the majority of cases.
2.1.1.1.9.0.5 D-Optimality2.1.1.1.9.0.5.1 D-Criterion/tmp/babel-w5lAu8/figurey8nlJc.pdf
./img/bicgkernel_gaussian_pareto.pdf
./img/bicgkernel_rs_pareto.pdf
/tmp/babel-TX7U1J/figurek1JPiJ.pdf
/tmp/babel-TX7U1J/figure5ifJok.pdf
/tmp/babel-TX7U1J/figureMcsVvv.pdf
optimisation start ------------------ * estimation method : MLE * optimisation method : BFGS * analytical gradient : used * trend model : ~1 * covariance model : - type : matern5_2 - nugget : NO - parameters lower bounds : 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 - parameters upper bounds : 4094 4094 62 62 2 62 4094 2 4094 4094 62 2 4094 4094 4094 4094 2 2 2 4094 2 4094 2 4094 2 30 30 4094 2 30 30 30 30 30 30 30 30 30 62 62 4094 4094 30 4094 30 62 62 30 - best initial criterion value(s) : -712.9143 N = 48, M = 5 machine precision = 2.22045e-16 At X0, 0 variables are exactly at the bounds At iterate 0 f= 712.91 |proj g|= 0.15272 At iterate 1 f = 712.88 |proj g|= 0.15294 ys=-1.679e-05 -gs= 3.601e-02, BFGS update SKIPPED At iterate 2 f = 711.11 |proj g|= 0.16401 At iterate 3 f = 708.01 |proj g|= 0.18559 ys=-1.555e-01 -gs= 3.027e+00, BFGS update SKIPPED At iterate 4 f = 703.39 |proj g|= 0.22168 ys=-4.679e-01 -gs= 4.374e+00, BFGS update SKIPPED At iterate 5 f = 698.95 |proj g|= 0.25502 ys=-3.892e-01 -gs= 4.231e+00, BFGS update SKIPPED At iterate 6 f = 690.43 |proj g|= 0.2156 At iterate 7 f = 688.99 |proj g|= 0.1577 At iterate 8 f = 688 |proj g|= 0.010538 At iterate 9 f = 687.94 |proj g|= 0.013195 At iterate 10 f = 687.66 |proj g|= 0.04427 At iterate 11 f = 687.49 |proj g|= 0.023682 At iterate 12 f = 687.24 |proj g|= 0.010114 At iterate 13 f = 687.22 |proj g|= 0.010098 At iterate 14 f = 687.17 |proj g|= 0.010069 At iterate 15 f = 686.88 |proj g|= 0.01359 At iterate 16 f = 686.58 |proj g|= 0.0094514 At iterate 17 f = 686.45 |proj g|= 0.02374 At iterate 18 f = 686.28 |proj g|= 0.03131 At iterate 19 f = 686.14 |proj g|= 0.015667 At iterate 20 f = 685.98 |proj g|= 0.0087571 At iterate 21 f = 685.91 |proj g|= 0.0086694 At iterate 22 f = 685.87 |proj g|= 0.0086355 At iterate 23 f = 685.25 |proj g|= 0.028855 At iterate 24 f = 684.9 |proj g|= 0.015802 At iterate 25 f = 684.16 |proj g|= 0.006761 At iterate 26 f = 682.51 |proj g|= 0.012195 At iterate 27 f = 682.18 |proj g|= 0.0074427 At iterate 28 f = 682.16 |proj g|= 0.0048397 At iterate 29 f = 681.51 |proj g|= 0.046568 At iterate 30 f = 681.47 |proj g|= 0.01972 At iterate 31 f = 681.46 |proj g|= 0.0042831 At iterate 32 f = 681.45 |proj g|= 0.0042837 At iterate 33 f = 681.45 |proj g|= 0.0042772 At iterate 34 f = 681.44 |proj g|= 0.016674 At iterate 35 f = 680.36 |proj g|= 0.03849 At iterate 36 f = 677.18 |proj g|= 0.0067063 At iterate 37 f = 677 |proj g|= 0.0030387 At iterate 38 f = 676.96 |proj g|= 0.0038301 At iterate 39 f = 676.81 |proj g|= 0.0046546 At iterate 40 f = 676.64 |proj g|= 0.063529 At iterate 41 f = 676.54 |proj g|= 0.0050253 At iterate 42 f = 676.39 |proj g|= 0.016501 At iterate 43 f = 675.15 |proj g|= 0.1002 At iterate 44 f = 674.62 |proj g|= 0.063694 At iterate 45 f = 674.47 |proj g|= 0.013433 At iterate 46 f = 674.38 |proj g|= 0.016088 At iterate 47 f = 674.23 |proj g|= 0.045203 At iterate 48 f = 673.83 |proj g|= 0.081005 At iterate 49 f = 672.99 |proj g|= 0.10431 At iterate 50 f = 672.61 |proj g|= 0.053047 At iterate 51 f = 672.01 |proj g|= 0.022967 At iterate 52 f = 671.77 |proj g|= 0.042637 At iterate 53 f = 671.06 |proj g|= 0.071004 At iterate 54 f = 670.98 |proj g|= 0.021088 At iterate 55 f = 670.97 |proj g|= 0.00061831 At iterate 56 f = 670.97 |proj g|= 0.00083355 At iterate 57 f = 670.97 |proj g|= 0.0025396 At iterate 58 f = 670.97 |proj g|= 0.0051888 At iterate 59 f = 670.96 |proj g|= 0.0094556 At iterate 60 f = 670.96 |proj g|= 0.015551 At iterate 61 f = 670.95 |proj g|= 0.023173 At iterate 62 f = 670.93 |proj g|= 0.028516 At iterate 63 f = 670.89 |proj g|= 0.021213 At iterate 64 f = 670.87 |proj g|= 0.014479 At iterate 65 f = 670.83 |proj g|= 0.0023803 At iterate 66 f = 670.77 |proj g|= 0.018208 At iterate 67 f = 670.66 |proj g|= 0.032607 At iterate 68 f = 670.64 |proj g|= 0.0050425 At iterate 69 f = 670.4 |proj g|= 0.0040009 At iterate 70 f = 670.17 |proj g|= 0.0085998 At iterate 71 f = 669.94 |proj g|= 0.0020373 At iterate 72 f = 669.93 |proj g|= 0.0073922 At iterate 73 f = 669.9 |proj g|= 0.021443 At iterate 74 f = 669.81 |proj g|= 0.048122 At iterate 75 f = 669.61 |proj g|= 0.079202 At iterate 76 f = 669.41 |proj g|= 0.080633 At iterate 77 f = 669.22 |proj g|= 0.028117 At iterate 78 f = 669.18 |proj g|= 0.0045062 At iterate 79 f = 669.18 |proj g|= 0.0010684 At iterate 80 f = 669.18 |proj g|= 0.0019219 At iterate 81 f = 669.18 |proj g|= 0.0021834 At iterate 82 f = 669.17 |proj g|= 0.00082061 At iterate 83 f = 669.16 |proj g|= 0.0052274 At iterate 84 f = 669.14 |proj g|= 0.0096021 At iterate 85 f = 669.1 |proj g|= 0.0028422 At iterate 86 f = 669.06 |proj g|= 0.033474 At iterate 87 f = 668.99 |proj g|= 0.026382 At iterate 88 f = 668.66 |proj g|= 0.010887 At iterate 89 f = 668.62 |proj g|= 0.0090599 At iterate 90 f = 668.57 |proj g|= 0.0061299 At iterate 91 f = 668.26 |proj g|= 0.014394 At iterate 92 f = 667.89 |proj g|= 0.028062 At iterate 93 f = 667.26 |proj g|= 0.063625 At iterate 94 f = 666.72 |proj g|= 0.027098 At iterate 95 f = 666.13 |proj g|= 0.011637 At iterate 96 f = 665.98 |proj g|= 0.0075273 At iterate 97 f = 665.86 |proj g|= 0.011719 At iterate 98 f = 665.54 |proj g|= 0.023306 At iterate 99 f = 665.46 |proj g|= 0.026 At iterate 100 f = 665.33 |proj g|= 0.012704 At iterate 101 f = 664.82 |proj g|= 0.01516 final value 664.821889 stopped after 101 iterations
library(ggplot2)
df <- data.frame(mean = x$S$mean[1, ],
var = x$S$var[1, ],
varPG = x$S$varPG[1, ],
ci = sqrt(x$S$var[1, ]) * 1.96,
ci_inf = x$S$ci[1, ],
ci_sup = x$S$ci[2, ]
)
df <- df %>%
dplyr::mutate(id = row_number())
ggplot(df) +
geom_point(aes(x = id, y = mean), size = 4) +
geom_errorbar(aes(x = id, ymin = mean - ci, ymax = mean + ci)) +
#geom_errorbar(aes(x = id, ymin = mean - varPG, ymax = mean + varPG)) +
theme_bw(base_size = 28)
/tmp/babel-GhpmaA/figurewW2gDS.pdf
/tmp/babel-VnHilA/figureqnxBTV.pdf
/tmp/babel-XCC5io/figurenXziDX.pdf
./img/gp_rs_dgemv_comparison.pdf
v1 | v2 | |
---|---|---|
v1 | 1.0 | 0.0 |
v2 | 0.0 | 1.0 |
library(latex2exp)
library(MASS)
n <- 2
sigma <- data.frame(d1 = c(1, 0), d2 = c(0, 1))
means <- c(0, 0)
mv_sample <- mvrnorm(n = 300, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)
library(ggplot2)
library(GGally)
ggpairs(mv_sample) +
theme_bw(base_size = 28)
To get strong correlations, the covariance matrix could be:
v1 | v2 | |
---|---|---|
v1 | 1.0 | 0.8 |
v2 | 0.8 | 1.0 |
library(MASS)
n <- 2
sigma <- data.frame(d1 = c(1, 0.8), d2 = c(0.8, 1))
means <- c(0, 0)
mv_sample <- mvrnorm(n = 600, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)
names(mv_sample) <- paste("V", seq(1, n), sep = "")
library(ggplot2)
library(GGally)
ggpairs(mv_sample) +
theme_bw(base_size = 26)
library(latex2exp)
library(MASS)
n <- 10
sigma <- data.frame(diag(10))
names(sigma) <- paste("d", seq(1, n), sep = "")
means <- rep(0, n)
mv_sample <- mvrnorm(n = 300, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)
names(mv_sample) <- paste("d", seq(1, n), sep = "")
library(ggplot2)
library(GGally)
ggpairs(mv_sample) +
theme_bw(base_size = 22)
library(latex2exp)
library(MASS)
n <- 10
sigma <- data.frame(diag(10))
names(sigma) <- paste("d", seq(1, n), sep = "")
means <- rep(0, n)
mv_sample <- mvrnorm(n = 1000, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)
names(mv_sample) <- paste("d", seq(1, n), sep = "")
library(dplyr)
library(tidyr)
library(ggplot2)
library(latex2exp)
plot_data <- mv_sample
sampled_function <- sample_n(plot_data, 1)
plot_data <- plot_data %>%
gather("x", "f_x") %>%
mutate(x = ordered(x, levels = names(mv_sample)))
sampled_function <- sampled_function %>%
gather("x", "f_x") %>%
mutate(x = ordered(x, levels = names(mv_sample)))
ggplot(plot_data, aes(x = x, y = f_x)) +
geom_jitter(color = "gray48", size = 3, width = 0.25, alpha = 0.2) +
geom_point(data = sampled_function,
aes(color = "Sample of Multivariate Normal"),
size = 4) +
geom_line(data = sampled_function,
color = "red",
size = 1,
alpha = 0.3) +
ylab(TeX("Sampled Values")) +
xlab(TeX("Dimensions")) +
scale_fill_manual("", values = "gray48") +
scale_color_brewer(palette = "Set1") +
theme_bw(base_size = 26) +
theme(legend.title = element_blank(),
legend.background = element_rect(fill = "transparent"),
legend.position = c(0.24, 0.06))
library(dplyr)
library(tidyr)
library(ggplot2)
library(latex2exp)
n <- 10
plot_data <- mv_sample
names(plot_data) <- seq(1, n)
sampled_function <- sample_n(plot_data, 1)
plot_data <- plot_data %>%
gather("x", "f_x") %>%
mutate(x = as.numeric(x))
sampled_function <- sampled_function %>%
gather("x", "f_x") %>%
mutate(x = as.numeric(x))
ggplot(plot_data, aes(x = x, y = f_x)) +
geom_jitter(color = "gray48", size = 3, width = 0.25, alpha = 0.2) +
geom_point(data = sampled_function,
aes(color = "Sampled Function"),
size = 4) +
geom_line(data = sampled_function,
color = "red",
size = 1,
alpha = 0.3) +
ylab(TeX("Samples of $(d_1,\\ldots,d_{10})$ interpreted as $f(x \\in \\lbrack 1,10 \\rbrack)$")) +
xlab(TeX("$(d_1,\\ldots,d_{10})$ interpreted as discrete $x \\in \\lbrack 1,10 \\rbrack$")) +
scale_x_discrete(limits = seq(1, 10)) +
scale_fill_manual("", values = "gray48") +
scale_color_brewer(palette = "Set1") +
theme_bw(base_size = 26) +
theme(legend.title = element_blank(),
legend.background = element_rect(fill = "transparent"),
legend.position = c(0.2, 0.06))
d <- 2
n <- 30
target_X <- expand.grid(x1 = seq(-1, 1, length = n), x2 = seq(-1, 1, length = n))
library(ggplot2)
library(GGally)
ggpairs(target_X) +
theme_bw(base_size = 26)
library(Rcpp)
library(dplyr)
library(MASS)
src <-
"#include <Rcpp.h>
#include <math.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix rcpp_squared_exponential_kernel(NumericMatrix x,
NumericMatrix y,
float amplitude,
NumericVector lengthscale){
NumericMatrix output(x.nrow(), y.nrow());
float distance;
for(int i = 0; i < x.nrow(); i++) {
for(int j = 0; j < y.nrow(); j++) {
distance = 0;
for(int k = 0; k < x.ncol(); k++) {
distance += (((x(i, k) - y(j, k)) *
(x(i, k) - y(j, k))) /
(lengthscale(k) * lengthscale(k)));
}
output(i, j) = amplitude *
amplitude *
exp(-0.5 * distance);
}
}
return(output);
}"
sourceCpp(code = src)
random_design <- function(factors, size) {
data.frame(replicate(factors, runif(size)))
}
simulate_gp <- function(covariance_matrix) {
return(unname(mvrnorm(n = 1,
rep(0.0, nrow(covariance_matrix)),
covariance_matrix)))
}
sqexp_covariance_matrix <- function(data, amplitude, lengthscale) {
return(rcpp_squared_exponential_kernel(as.matrix(data),
as.matrix(data),
amplitude,
lengthscale))
}
significance_probability <- 0.10
amplitude <- 1.0
lengthscale <- rep(0.2, n)
cov_matrix <- sqexp_covariance_matrix(target_X, amplitude, lengthscale)
plot_data <- target_X
library(DiceKriging)
library(dplyr)
library(RColorBrewer)
library(lattice)
se_kernel <- function(x, x_p, l = 0.2) {
return(exp(-(
((x - x_p) ^ 2) /
((2 * l) ^ 2)
)))
}
d <- 2
n <- 32
point_grid <- data.frame(x1 = seq(-1, 1, length = n), x2 = rep(0, n))
evaluated_grid <- data.frame(point_grid)
evaluated_grid$y <- se_kernel(point_grid$x1, point_grid$x2)
colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)
ggplot(evaluated_grid, aes(x = x1, y = y)) +
geom_point() +
theme_bw(base_size = 18)
library(lattice)
library(RColorBrewer)
plot_data$y <- simulate_gp(cov_matrix)
colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)
wireframe(y ~ x1 * x2,
data = plot_data,
xlab = "x1",
ylab = "x2",
zlab = list("k(x,x')",
rot = "90"),
#zlim = range(seq(0.0, 1.0, by = 0.5)),
colorkey = FALSE,
col.regions = colors,
drape = TRUE,
#shade = TRUE,
lattice.options = lattice.options(list(border = FALSE)),
scales = list(arrows = FALSE),
screen = list(z = 20, x = -60, y = 0),
par.settings = list(axis.line = list(col = "transparent")))
library(ggplot2)
library(dplyr)
library(DiceKriging)
library(DiceOptim)
gpr_data <- data.frame(x = c(0.1, 0.25, 0.3, 0.67, 0.8),
y = c(0.1, 0.1, 0.6, -0.02, 0.1))
# gpr_data <- data.frame(x = c(0.45, 0.5, 0.55),
# y = c(-0.1, 0.08, 0.2))
x_allowed <- data.frame(x = seq(0, 1, length = 300))
# reg <- km(formula = ~ I(x^2), design = dplyr::select(gpr_data, -y),
# control = list(pop.size = 40,
# BFGSburnin = 4),
# response = gpr_data$y)
reg <- km(design = dplyr::select(gpr_data, -y),
control = list(pop.size = 400,
BFGSburnin = 400),
response = gpr_data$y)
pred <- predict(reg, x_allowed, "UK")
x_allowed$y <- pred$mean
x_allowed$ymin <- pred$mean - (2 * pred$sd)
x_allowed$ymax <- pred$mean + (2 * pred$sd)
x_allowed$ei <- apply(dplyr::select(x_allowed, x), 1, EI, reg)
x_allowed$sampled_y <- simulate(reg, cond = TRUE, newdata = dplyr::select(x_allowed, x))[1, ]
x_allowed$uncond_y <- simulate(reg, cond = FALSE, newdata = dplyr::select(x_allowed, x))[1, ]
ggplot(data = gpr_data, aes(x = x, y = y)) +
geom_ribbon(data = x_allowed, aes(x = x, ymin = ymin, ymax = ymax, fill = "CI of the Mean"), alpha = 0.3) +
geom_line(data = x_allowed, size = 1, aes(color = "Predicted Mean")) +
#geom_line(data = x_allowed, size = 1, aes(x = x, y = sampled_y, color = "Conditioned Sample")) +
#geom_line(data = x_allowed, size = 1, aes(x = x, y = uncond_y, color = "Prior Sample")) +
geom_line(data = x_allowed, size = 1, aes(x = x, y = ei, color = "Expected Improvement")) +
geom_point(stroke = 2, shape = 3, size = 3, aes(color = "Observed")) +
geom_point(data = subset(x_allowed, ei == max(ei)), size = 4, stroke = 2, shape = 3, aes(x = x, y = ei, color = "Maximum EI")) +
scale_color_brewer(palette = "Set1") +
scale_fill_manual("", values = "gray48") +
theme_bw(base_size = 26) +
theme(legend.title = element_blank(),
legend.background = element_rect(fill = "transparent"),
legend.position = c(0.2, 0.2))
library(DiceKriging)
library(dplyr)
library(lattice)
constant_kernel <- function(x) {
return(1.0)
}
d <- 2
n <- 16
point_grid <- expand.grid(x1 = seq(0, 1, length = n), x2 = seq(0, 1, length = n))
evaluated_grid <- data.frame(point_grid)
evaluated_grid$y <- apply(point_grid, 1, constant_kernel)
str(evaluated_grid)
colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)
wireframe(y ~ x1 * x2,
data = evaluated_grid,
xlab = "x1",
ylab = "x2",
zlab = list("k(x,x')",
rot = "90"),
zlim = range(seq(0.0, 2.0, by = 0.5)),
colorkey = FALSE,
col.regions = colors,
drape = TRUE,
#shade = TRUE,
lattice.options = lattice.options(list(border = FALSE)),
scales = list(arrows = FALSE),
screen = list(z = 140, x = -60, y = 0),
par.settings = list(axis.line = list(col = "transparent")))
results_data <- select(complete_plot_data,
OMP, SCR, VEC1, VEC2,
RT_I, RT_J, U1_I, T1_I,
T1_J, T2_I, T2_J, U_I, U_J,
cost_mean)
summary(aov(cost_mean ~ .*., results_data))
nobin_results_data <- results_data %>%
filter(OMP == "True", SCR == "True",
VEC1 == "True", VEC2 == "True") %>%
select(-OMP, -VEC1,
-VEC2, -SCR)
summary(aov(cost_mean ~ .*., nobin_results_data))
results_data <- select(complete_plot_data,
OMP, SCR, VEC1, VEC2,
RT_I, RT_J, U1_I, T1_I,
T1_J, T2_I, T2_J, U_I, U_J,
cost_mean)
library(DiceKriging)
library(dplyr)
library(rsm)
sampled_results_data <- sample_n(results_data, 5000)
sampled_results_data <- sampled_results_data %>% mutate_all(funs(str_replace(., "True", "1"))) %>%
mutate_all(funs(str_replace(., "False", "0"))) %>%
mutate_all(funs(as.integer(.)))
coded_sampled_design <- coded.data(select(sampled_results_data, -cost_mean),
formulas = list(T1_Ie = T1_Ie ~ (T1_I - 5.5) / 5.5,
T1_Je = T1_Je ~ (T1_J - 5.5) / 5.5,
U_Je = U_Je ~ (U_J - 14.5) / 14.5,
U_Ie = U_Ie ~ (U_I - 14.5) / 14.5,
T2_Ie = T2_Ie ~ (T2_I - 5.5) / 5.5,
T2_Je = T2_Je ~ (T2_J - 5.5) / 5.5,
U1_Ie = U1_Ie ~ (U1_I - 14.5) / 14.5,
OMPe = OMPe ~ (OMP - 0.5) / 0.5,
SCRe = SCRe ~ (SCR - 0.5) / 0.5,
VEC1e = VEC1e ~ (VEC1 - 0.5) / 0.5,
VEC2e = VEC2e ~ (VEC2 - 0.5) / 0.5,
RT_Ie = RT_Ie ~ (RT_I - 2.5) / 2.5,
RT_Je = RT_Je ~ (RT_J - 2.5) / 2.5))
reg <- km(design = coded_sampled_design,
response = sampled_results_data$cost_mean)
reg
The paper presents the ε-sticky algorithm, an extension of the ε-greedy algorithm for the Multi-Armed Bandit problem to the selection of Access Points by user stations, and evaluates the performance of the proposed approach in two scenarios, varying in the distribution of user stations. The proposed approach presents significant improvements in user station satisfaction, in relation to a Strongest Signal Access Point selection algorithm. The paper significantly extends the authors’ previous work, providing a brief introduction to the Multi-Armed Bandit problem and its different types, and a significantly more extensive evaluation and validation of the proposed approach. The experimental methodology is solid, and the proposed approach convincingly performs better than a Strongest Signal selection algorithm, in all scenarios studied. The study of the best values of ε was also interesting, since choosing a good value for the parameter has a significant impact on the observed throughput.
3.1.1.2 Applications of Program Autotuning: Call for Proposals for Funding for Undergraduates, Masters, and PhD Students
We are glad to announce a call for masters and PhD scholarship proposals for a research project, done in the context of a collaboration between the University of São Paulo and the Hewlett Packard Enterprise company, starting on February 2020.
The work to be performed by candidates will require 10 hours per week, and will involve the application of techniques for statistical optimization and design of experiments to a problem presented by the candidate. We are looking for students interested in optimizing their applications, or applications used in their work, targeting different metrics, such as performance, memory or network usage.
Ideal problems for this project include problems where the search spaces involved are too large to be explored exhaustively, and have no trivial configuration. Examples of problems include the selection of hyperparameters of Machine Learning algorithms to increase prediction accuracy, selecting compiler flags to improve performance, and configuring user-developed applications to improve user-defined metrics.
We would like students who have already stated their research problems and have had some progress towards measuring and evaluating their objectives, but highly motivated students who are just starting are also welcome to apply.
Up to 5 accepted candidates will receive funding, comparable to scholarships from CAPES, payed by IME/USP for eight months, between February and September 2020. Students currently without funding are preferred.
We invite interested students to submit a CV and a cover letter, summarizing their problem and their interest in applying optimization techniques to it, by February 20th 2020. We assume your advisor is aware and consents with your participation in this project. Accepted candidates will be invited to a two-day workshop at IME/USP, where techniques and libraries for statistical optimization and design of experiments will be presented, and candidates will be asked to give short presentations about their work. Final acceptance notification will be sent by February 29th.
Project Coordinator: Prof. Dr. Alfredo Goldman Project Manager: Pedro Bruel
- https://en.wikipedia.org/wiki/MILEPOST_GCC
- https://github.com/ctuning
- https://github.com/ctuning/ck/wiki
- Ofast
- unsafe optim
- error rate
- Focus on flags at first:
- Explore numerical, categorical parameters later
- LTO
- Which flags should be chosen?
- Baselines:
- O0, …, O3, (Ofast, …?)
- Random sample of flag configurations
- Performance, memory, binary size, …
- Focus on one metric first
- Pareto frontier of the others
- We could try something like a normalized, weighted sum of metrics
- Plan a screening experiment: Identify Main Effects
- Leverage expert knowledge plus screening to plan next experiments
- Low-discrepancy sampling, linear model, ANOVA
- Gaussian Process Regression?
- Define a benchmark
- Firefox
- SPEC
- BLAS Lapack
- Define metrics
- Define the search space
- Flags
- Others
- Start experiments
- Random sample
- Screening
- Linear models + ANOVA
- Linear models + ANOVA + Optimal Design
- Gaussian Process Regression
Consider the following function of two factors
\[ f(x_1, x_2) + ε = 2.3 + (1.1x_1) + (1.6x12) + (2.2x_2) + (3.2x22) + ε \]
That is,
In R, we can write
f <- function(x1, x2) {
return(2.3 + (1.1 * x1) + (1.6 * x1 * x1) +
(2.2 * x2) + (3.2 * x2 * x2) + (2.7 * x1 * x2) + rnorm(length(x1)))
}
# Measuring 5 values of f(x1, x2)
f(rnorm(5), rnorm(5))
The inputs of
resolution <- 0.05
sample_grid <- expand.grid(x1 = seq(-1.0, 1.0, by = resolution),
x2 = seq(-1.0, 1.0, by = resolution))
sample_grid$f <- f(sample_grid$x1, sample_grid$x2)
str(sample_grid)
Plotting this low-resolution view the space, we get:
library(lattice)
library(RColorBrewer)
colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)
wireframe(f ~ x1 * x2,
data = sample_grid,
xlab = "x1",
ylab = "x2",
zlab = list("f(x1,x2)",
rot = "90"),
zlim = range(seq(min(sample_grid$f) - 2.0, max(sample_grid$f) + 2.0, by = 0.5)),
colorkey = FALSE,
col.regions = colors,
drape = TRUE,
#shade = TRUE,
lattice.options = lattice.options(list(border = FALSE)),
scales = list(arrows = FALSE),
screen = list(z = 20, x = -60, y = 0),
par.settings = list(axis.line = list(col = "transparent")))
library(dplyr)
library(AlgDesign)
budget <- 200
sample_rs <- sample_n(sample_grid, budget)
sample_rs$method <- "Random Design"
exploration <- sample_rs
sample_linear_model <- optFederov(~ x1 + x2 + x1:x2,
data = sample_grid,
nTrials = budget)$design
sample_linear_model$method <- "Linear Model Design"
exploration <- bind_rows(exploration,
sample_linear_model)
sample_quadratic_model <- optFederov(~ (x1 + x2) + (x1:x2) + I(x1 ^ 2) + I(x2 ^ 2),
data = sample_grid,
nTrials = budget)$design
sample_quadratic_model$method <- "Quadratic Model Design"
exploration <- bind_rows(exploration,
sample_quadratic_model)
str(exploration)
library(ggplot2)
ggplot(data = exploration, aes(x = x1, y = x2)) +
facet_wrap(method ~ ., ncol = 3) +
geom_point(size = 3) +
theme_bw(base_size = 22)
We now remember the definition of
\[ f(x_1, x_2) = 2.3 + (1.1x_1) + (1.6x12) + (2.2x_2) + (3.2x22) + ε \]
If we try to obtain a regression function for
rs_lm <- lm(f ~ x1 * x2 * I(x1 ^ 2) * I(x2 ^ 2), data = filter(exploration, method == "Random Design"))
lin_lm <- lm(f ~ x1 * x2 * I(x1 ^ 2) * I(x2 ^ 2), data = filter(exploration, method == "Linear Model Design"))
quad_lm <- lm(f ~ x1 * x2 * I(x1 ^ 2) * I(x2 ^ 2), data = filter(exploration, method == "Quadratic Model Design"))
coef(rs_lm)
coef(lin_lm)
coef(quad_lm)
rs_lm <- aov(f ~ x1 * x2 + I(x1 ^ 2) + I(x2 ^ 2), data = filter(exploration, method == "Random Design"))
quad_lm <- aov(f ~ x1 * x2 + I(x1 ^ 2) + I(x2 ^ 2), data = filter(exploration, method == "Quadratic Model Design"))
summary.aov(rs_lm)
prediction_rs <- predict(rs_lm, newdata = select(sample_grid, -f))
prediction_error <- data.frame(error = sample_grid[prediction_rs == min(prediction_rs), ]$f - min(prediction_rs))
prediction_error$method <- "Random Design"
prediction_lin <- predict(lin_lm, newdata = select(sample_grid, -f))
prediction_error_lin <- data.frame(error = sample_grid[prediction_lin == min(prediction_lin), ]$f - min(prediction_lin))
prediction_error_lin$method <- "Linear Model Design"
prediction_error <- bind_rows(prediction_error,
prediction_error_lin)
prediction_quad <- predict(quad_lm, newdata = select(sample_grid, -f))
prediction_error_quad <- data.frame(error = sample_grid[prediction_quad == min(prediction_quad), ]$f - min(prediction_quad))
prediction_error_quad$method <- "Quadratic Model Design"
prediction_error <- bind_rows(prediction_error,
prediction_error_quad)
prediction_error
my_function <- function(x, y, z) {
x + y
paste(z, collapse = "", sep = "")
}
my_function(2, 3, c("a", "v", "c"))
- https://github.com/ReScience
- https://www.nature.com/articles/s41562-016-0021
- https://www.semanticscholar.org/paper/A-manifesto-for-reproducible-science-Munaf%C3%B2-Nosek/a68ce412e92d87ca0116519651bbc484d98c76ae
- https://rescience.github.io/faq/
- https://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf
Conclusões produzidas a partir de dados obtidos experimentalmente não podem ser consideradas validadas até que seja possível reproduzi-las em condições experimentais independentes. Esse princípio orienta todo o progresso científico baseado em metodologias experimentais. A pesquisa experimental em Ciência da Computação está em posição singular para reforçar e promover a reprodutibilidade científica, pois experimentos computacionais em determinados casos podem ser acompanhados, registrados, e repetidos com precisão e controle praticamente impossíveis em áreas como a biologia e a química. As organizações em prol da ciência brasileira têm portanto grandes justificativas para promover e reforçar a reprodutibilidade científica.
Os esforços da ACM são um bom exemplo dos esforços iniciais que podem ser realizados em prol da reprodutibilidade. Diversas conferências e periódicos da ACM adotam um sistema de insígnias para marcar trabalhos cujos esforços para reprodutibilidade de seus experimentos são significativos. A nomenclatura utilizada pela ACM é derivada do Vocabulário Internacional de Metrologia (VIM), e distingue entre resultados e conclusões que podem ser reproduzidos:
- Pela mesma equipe, nas mesmas condições experimentais (Repetibilidade)
- Por uma equipe diferente, nas mesmas condições experimentais (Replicabilidade)
- Por uma equipe diferente, em condições experimentais diferentes (Reprodutibilidade)
O código e os dados que dão suporte às conclusões de um estudo científico devem ser submetidos junto ao documento que será publicado. Esses artefatos são avaliados pelos revisores e insígnias são conferidas de acordo com o nível de reprodutibilidade alcançado. Outras organizações também promovem a reprodutibilidade, como a ReScience, que recentemente promoveu o Desafio de 10 Anos da Reprodutibilidade, onde pesquisadores foram incentivados a submeter artigos com a reprodução de seus próprios resultados de no mínimo 10 anos de idade.
Ações como as tomadas pela ACM podem ter um grande impacto no reforço da credibilidade do método científico e no avanço da descoberta científica no Brasil.
The paper presents an interesting study of the performance of a redundant operation detection tool, CIDetector. The tool is used to identify regions in the machine code generated by a compiler that contain redundant operations. These regions are then modified to remove redundancies, and the percentage of redundancy reduction and the resulting speedups are reported. The paper is clearly structured and easy to follow.
The paper evaluates the detection tool on 14 programs, composing the CIBenchmark, which is also introduced by the paper. CIDetector is tested on 3 GCC versions, 1 ICC, and 1 LLVM versions, by measuring the reduction on redundant operations, caused by changes in regions detected to produce these kinds of redundancies. In some scenarios, eliminating redundant operations produces execution time speedups.
Strangely, the paper does not provide access to any of its source code or data, but mentions that the code will be open-sourced provided the paper is accepted. On top of being a strange practice, this means I was not able to independently verify or validate any of the data or the code presented in this paper.
The paper does not discuss whether the results reported are a mean of a certain number of executions, and no standard deviation or confidence interval of the mean is presented. It is strongly suggested that these statistics and discussions are added to the paper, in order to strengthen its conclusions.
Overall, the paper presents valuable insights and careful evaluation and discussion of the results. I believe this to be a borderline paper, which could be accepted provided the statistical analysis methodology is clearly presented and discussed. I also believe it would be interesting to compare the performance of the code generated by different compilers.
�[32ml�[32ml�[32mv�[32mm�[32m-�[32ma�[32ms�[39m < /dev/null | �[32mo�[32mp�[32mt�[39m -O2 -disable-output -debug-pass=Arguments�[?2004l Pass Arguments: -tti -tbaa -scoped-noalias -assumption-cache-tracker -targetlibinfo -verify -ee-instrument -simplifycfg -domtree -sroa -early-cse -lower-expect Pass Arguments: -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -called-value-propagation -attributor -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -basicaa -aa -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -libcalls-shrinkwrap -loops -branch-prob -block-freq -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -pgo-memop-opt -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -loop-rotate -licm -loop-unswitch -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -indvars -loop-idiom -loop-deletion -loop-unroll -mldst-motion -phi-values -basicaa -aa -memdep -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -gvn -phi-values -basicaa -aa -memdep -memcpyopt -sccp -demanded-bits -bdce -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -lazy-value-info -jump-threading -correlated-propagation -basicaa -aa -phi-values -memdep -dse -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -licm -postdomtree -adce -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -barrier -elim-avail-extern -basiccg -rpo-functionattrs -globalopt -globaldce -basiccg -globals-aa -float2int -domtree -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -loop-rotate -loop-accesses -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -loop-distribute -branch-prob -block-freq -scalar-evolution -basicaa -aa -loop-accesses -demanded-bits -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -loop-vectorize -loop-simplify -scalar-evolution -aa -loop-accesses -lazy-branch-prob -lazy-block-freq -loop-load-elim -basicaa -aa -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -domtree -loops -scalar-evolution -basicaa -aa -demanded-bits -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -slp-vectorizer -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -loop-unroll -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -licm -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -transform-warning -alignment-from-assumptions -strip-dead-prototypes -globaldce -constmerge -domtree -loops -branch-prob -block-freq -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -block-freq -loop-sink -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instsimplify -div-rem-pairs -simplifycfg -verify Pass Arguments: -domtree Pass Arguments: -targetlibinfo -domtree -loops -branch-prob -block-freq Pass Arguments: -targetlibinfo -domtree -loops -branch-prob -block-freq
[1] "test"
:
[1] 3
:
[1] 3
test 4
library(dplyr)
library(ggplot2)
map_intervals <- function(x, interval_from, interval_to) {
new_x <- x - interval_from[1]
new_x <- new_x / (interval_from[2] - interval_from[1])
new_x <- new_x * (interval_to[2] - interval_to[1])
new_x <- new_x + interval_to[1]
return(new_x)
}
df <- data.frame(x = map_intervals(runif(100), c(0.0, 1.0), c(-1.0, 1.0)),
y = map_intervals(runif(100), c(0.0, 1.0), c(-1.0, 1.0)))
df$y <-
Formulas in R:
y = (alpha * x1) + (beta * x2) y ~ x1 + x2
y = (α_1 * x1) + (α_2 * x2) + (α_3 * x1 * x2) y ~ x1 * x2
y = (α_3 * x1 * x2) y ~ x1:x2
y = (α_1 * x1) + (α_2 * x2) + (α_3 * x1 * x2)
:
Loading required package: SparseM
:
Attaching package: ‘SparseM’
:
The following object is masked from ‘package:base’:
:
backsolve
- Gastos de fevereiro?
- Bolsa:
- Não é um pedido extra
- Hardware: problemas de covid
- Acesso aos alunos: problema de segurança
library(randtoolbox)
library(dplyr)
library(tidyr)
map_intervals <- function(x, interval_from, interval_to) {
new_x <- x - interval_from[1]
new_x <- new_x / (interval_from[2] - interval_from[1])
new_x <- new_x * (interval_to[2] - interval_to[1])
new_x <- new_x + interval_to[1]
return(new_x)
}
n = 216
d = 108
temp_sobol <- torus(n = n,
dim = d,
# scrambling = 1,
# seed = as.integer((99999 - 10000) * runif(1) + 10000),
init = TRUE)
rm(temp_sobol)
design <- data.frame(halton(n = n,
dim = d,
# scrambling = 2,
# seed = as.integer((99999 - 10000) * runif(1) + 10000),
init = FALSE)) %>%
map_intervals(c(0.0, 1.0), c(1.0, 8.0)) %>%
round()
design$method <- "Sobol"
rs_design <- data.frame(matrix(runif(n * 108), ncol = d, nrow = n)) %>%
map_intervals(c(0.0, 1.0), c(1.0, 8.0)) %>%
round()
rs_design$method <- "Uniform"
design <- bind_rows(design, rs_design) %>%
pivot_longer(-method, names_to = "Layer", values_to = "Bitwidth")
ggplot() +
geom_histogram(data = design,
bins = 8,
aes(x = Bitwidth,
y = ..count..)) +
facet_wrap(. ~ method, ncol = 1)
�[?2004l fatal: destination path 'top500' already exists and is not an empty directory. Already up to date.
'data.frame': 27000 obs. of 52 variables: $ Year : int 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 ... $ Month : int 6 6 6 6 6 6 6 6 6 6 ... $ Day : int 1 1 1 1 1 1 1 1 1 1 ... $ Rank : num 1 2 3 4 5 6 7 8 9 10 ... $ Site : Factor w/ 2467 levels " Institute of Information and Communication Technologies at the Bulgarian Academy of Sciences",..: 1287 1379 1513 1486 1516 135 1507 277 468 563 ... $ Manufacturer : Factor w/ 147 levels "Acer Group","ACTION",..: 141 141 141 141 98 98 141 76 28 28 ... $ Computer : Factor w/ 2825 levels " eServer pSeries 655 (1.7 GHz Power4+)",..: 790 799 798 798 2345 2344 796 995 2730 2730 ... $ Country : Factor w/ 60 levels "Australia","Austria",..: 58 58 58 58 26 7 58 58 58 58 ... $ Processors : num 1024 544 512 512 4 ... $ RMax : num 59.7 30.4 30.4 30.4 23.2 20 15.1 13.9 13.7 13.7 ... $ RPeak : num 131 69.6 65.5 65.5 25.6 ... $ Nmax : num 52224 36864 36864 36864 6400 ... $ Nhalf : num 24064 16384 16384 16384 830 ... $ Processor.Family : Factor w/ 26 levels "","Alpha","AMD",..: 25 25 25 25 21 21 25 14 7 7 ... $ Processor : Factor w/ 463 levels "Alpha","Alpha EV4",..: 267 267 267 267 133 133 267 63 25 25 ... $ Processor.Speed..MHz. : num 32 32 32 32 400 ... $ System.Family : Factor w/ 179 levels " IBM Power Systems",..: 174 174 174 174 122 122 174 104 33 33 ... $ Operating.System : Factor w/ 85 levels "AIX","Amazon Linux 2",..: 10 10 10 10 59 59 10 30 74 74 ... $ Architecture : Factor w/ 6 levels "Cluster","Constellations",..: 3 3 3 3 6 6 3 3 6 6 ... $ Segment : Factor w/ 7 levels "Academic","Classified",..: 6 4 1 2 7 6 6 1 7 6 ... $ Application.Area : Factor w/ 48 levels "","Aerospace",..: 37 37 37 37 37 46 37 37 37 37 ... $ Interconnect.Family : Factor w/ 27 levels "10G","Cray Interconnect",..: 8 8 8 8 3 3 8 17 17 17 ... $ Interconnect : Factor w/ 90 levels "100G Ethernet",..: 37 37 37 37 67 67 37 71 71 71 ... $ Region : Factor w/ 16 levels "Australia and New Zealand",..: 6 6 6 6 4 6 6 6 6 6 ... $ Continent : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 3 2 2 2 2 2 ... $ Power : num NA NA NA NA NA NA NA NA NA NA ... $ System.Model : Factor w/ 539 levels "","Acer AR585 F1 Cluster",..: 1 1 1 1 1 1 1 1 1 1 ... $ Total.Cores : num NA NA NA NA NA NA NA NA NA NA ... $ Measured.Size : num NA NA NA NA NA NA NA NA NA NA ... $ Processor.Cores : num NA NA NA NA NA NA NA NA NA NA ... $ Accelerator : Factor w/ 7 levels "","ATI GPU","IBM PowerXCell 8i",..: 1 1 1 1 1 1 1 1 1 1 ... $ Name : Factor w/ 848 levels "","Świerk Computing Centre",..: 1 1 1 1 1 1 1 1 1 1 ... $ Accelerator.Cores : num NA NA NA NA NA NA NA NA NA NA ... $ Efficiency.... : num NA NA NA NA NA NA NA NA NA NA ... $ Mflops.Watt : num NA NA NA NA NA NA NA NA NA NA ... $ Processor.Technology : Factor w/ 26 levels "","AMD x86_64",..: 1 1 1 1 1 1 1 1 1 1 ... $ OS.Family : Factor w/ 6 levels "","BSD Based",..: 1 1 1 1 1 1 1 1 1 1 ... $ Cores.per.Socket : num NA NA NA NA NA NA NA NA NA NA ... $ Processor.Generation : Factor w/ 75 levels "","AMD Naples",..: 1 1 1 1 1 1 1 1 1 1 ... $ Previous.Rank : num NA NA NA NA NA NA NA NA NA NA ... $ First.Appearance : num NA NA NA NA NA NA NA NA NA NA ... $ First.Rank : num NA NA NA NA NA NA NA NA NA NA ... $ Accelerator.Co.Processor.Cores : num NA NA NA NA NA NA NA NA NA NA ... $ Accelerator.Co.Processor : Factor w/ 59 levels "","AMD FirePro S10000",..: 1 1 1 1 1 1 1 1 1 1 ... $ Power.Source : Factor w/ 4 levels "","Derived","Optimized",..: 1 1 1 1 1 1 1 1 1 1 ... $ Rmax..TFlop.s. : num NA NA NA NA NA NA NA NA NA NA ... $ Rpeak..TFlop.s. : num NA NA NA NA NA NA NA NA NA NA ... $ HPCG..TFlop.s. : num NA NA NA NA NA NA NA NA NA NA ... $ Power..kW. : num NA NA NA NA NA NA NA NA NA NA ... $ Power.Effeciency..GFlops.Watts.: num NA NA NA NA NA NA NA NA NA NA ... $ Site.ID : num NA NA NA NA NA NA NA NA NA NA ... $ System.ID : int NA NA NA NA NA NA NA NA NA NA ...
library(ggplot2)
ggplot() +
geom_jitter(data = df,
alpha = 0.5,
height = 0.0,
size = 1.5,
aes(x = Year,
y = Processor.Speed..MHz. / 1000,
color = cut(Rank,
breaks = c(1, 167, 334, 500),
include.lowest = TRUE))) +
#scale_y_log10() +
scale_x_continuous(breaks = function(x) { seq(floor(min(x)),
ceiling(max(x)),
4) }) +
scale_color_brewer(name = "TOP500 Rank", palette = "Set1") +
ylab("Processor Speed (GHz)") +
theme_bw(base_size = 27) +
theme(legend.position = c(0.25, 0.95),
legend.direction = "horizontal",
legend.background = element_rect(fill = "transparent", colour = NA),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15)) +
guides(color = guide_legend(nrow = 3, override.aes = list(alpha = 1.0, size = 4)))
library(ggplot2)
ggplot() +
geom_jitter(data = df,
alpha = 0.5,
height = 0.0,
size = 1.5,
aes(x = Year,
y = Nmax,
color = cut(Rank,
breaks = c(1, 167, 334, 500),
include.lowest = TRUE))) +
scale_x_continuous(breaks = function(x) { seq(floor(min(x)),
ceiling(max(x)),
4) }) +
scale_color_brewer(name = "TOP500 Rank", palette = "Set1") +
ylab("Problem Size to Reach RMax") +
scale_y_log10() +
theme_bw(base_size = 27) +
theme(legend.position = c(0.25, 0.95),
legend.direction = "horizontal",
legend.background = element_rect(fill = "transparent", colour = NA),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15),
axis.text.y = element_text(angle = 90, hjust = 0.5)) +
guides(color = guide_legend(nrow = 3, override.aes = list(alpha = 1.0, size = 4)))
library(ggplot2)
plot_df <- df %>%
mutate(RMax = RMax / 1e3,
RPeak = RPeak / 1e3,
RMaxT = coalesce(RMax, Rmax..TFlop.s.),
RPeakT = coalesce(RPeak, Rpeak..TFlop.s.)) %>%
select(Rank,
Year,
RMaxT,
RPeakT,
HPCG..TFlop.s.) %>%
gather(-Rank, -Year, key = "Type", value = "Count") %>%
mutate(Type = factor(Type,
levels = c("RPeakT",
"RMaxT",
"HPCG..TFlop.s."),
labels = c("RPeak (HPL)",
"RMax (HPL)",
"RMax (HPCG)"))) %>%
filter(is.finite(Count))
ggplot() +
geom_jitter(data = plot_df,
alpha = 0.5,
height = 0.0,
size = 1.5,
aes(x = Year,
y = Count,
color = cut(Rank,
breaks = c(1, 167, 334, 500),
include.lowest = TRUE))) +
scale_x_continuous(breaks = function(x) { seq(floor(min(x)),
ceiling(max(x)),
6) }) +
scale_color_brewer(name = "TOP500 Rank", palette = "Set1") +
ylab("Performance (TFlops/s)") +
scale_y_log10() +
theme_bw(base_size = 27) +
theme(legend.position = c(0.83, 0.09),
legend.direction = "horizontal",
legend.background = element_rect(fill = "transparent", colour = NA),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15),
axis.text.y = element_text(angle = 90, hjust = 0.5)) +
guides(color = guide_legend(nrow = 3, override.aes = list(alpha = 1.0, size = 4))) +
facet_wrap(. ~ Type, ncol = 3)
library(dplyr)
library(ggplot2)
df <- read.csv("data/0e/ce81f1-5ade-41e4-b217-c0d157e93922/100_depth.csv")
ggplot(df) +
geom_col(aes(x = depth,
y = occurrences)) +
theme_bw(base_size = 28)
library(dplyr)
library(ggplot2)
df <- read.csv("data/0e/ce81f1-5ade-41e4-b217-c0d157e93922/100_depth.csv")
ggplot(df, aes(x = depth,
y = occurrences)) +
geom_point() +
geom_smooth(formula = y ~ I(1 / x),
method = "lm") +
theme_bw(base_size = 28)
The paper compares the performance of Julia with that of the pthreads C API using atomic operations and simple synchronization tasks, and also compares the performance of Julia and that of the MPI C++ API in microbenchmarks for communication, reads, and writes.
The paper presents an implementation of HPCG, a standard, communication-heavy, and large scale benchmark for HPC.
The paper argues that the Julia HPCG implementation outperforms the C++ MPI implementation in certain experimental conditions, and that Julia is capable of performing certain synchronization operations faster than the pthreads API.
The paper is clearly written and easy to read. The paper reports statistical analysis choices and experiment and experimental design, such as the number of repeated measurements and under which criteria were some outliers removed. Interesting discussions are made in the paper, such as regarding be the underlying causes of each observation, and some of the implementation choices for HPCG.
The paper does not make available any artifacts for the verification of data, code, or statistical analysis, mentioning these will be made available after acceptance. Under these conditions it is impossible to verify or attempt to reproduce results. The claims that the Julia language is capable of outperforming the C++ MPI implementation on HPCG seem rushed, given the data presented.
I believe that a more comprehensive experimental analysis of the performance of Julia in parallel and distributed settings should be performed to be able to claim improvements over well established technologies such as MPI and pthreads.
Although motivating and interesting, the microbenchmarks and small-scale threading comparisons do not provide a solid basis for claiming improvements, and the same is valid for HPCG. Further work should aim to compile a comprehensive set of large-scale, possibly real-world, applications written in Julia and C++ MPI.
The paper mentions that the variance of the HPCG measurements were large on the Julia code. It would enrich and better support the discussion to add the confidence intervals of these measurements, and for the other graphs also presented. It is usual to compute an interval of twice the standard deviation of the mean, in order to obtain an approximately 95% confidence interval.
- Add the 95% confidence intervals for the mean, on all plots, but especially for Figure 10
- Perform performance evaluations on a more comprehensive benchmark, in order to better support claims of improvement
- “Figure III-A shows the results.” ~> Should be “Figure 4”, at page 6
- “perfromance” ~> “performance”, at page 10
The experimental validation of the approaches studied during this thesis, namely Design of Experiments and Gaussian Process Regression, took longer than expected in the initial SPAPT benchmark. The large set of problems and number of configurable parameters slowed down a comprehensive and careful analysis of the relationships between parameters and performance for many problems.
The Gaussian Process Regression approach introduced further questions and restrictions for experimental validation, and the method was also applied to a different domain, the selection of bit precision for layers of Deep Neural Networks, where completing a single experimental run takes a full week.
I have already started my efforts towards writing the thesis, but our current assessment of the state of the work indicates that I most likely will not be able to finish before the end of the third year’s limit date.
Compounding the experimental difficulties sustained during the last year, the difficulties generated from the COVID-19 pandemic greatly hindered the pace of my work, as they did for many others. I continue my cotutelle from Brazil, a country that is at the moment the epicenter of the current pandemic. During the final stages of my thesis, I had to relocate to a different city and
Title: ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics
- Weak reject (I don’t want to publish this)
- I have published/worked on most/all this paper’s topics
The paper presents an autotuner for 44 parameters of the Spark data analytics framework. The optimization approach is a composite of Random Forests to identify the most significant parameters, Latin Hypercube Sampling to better cover the search space, and Gaussian Process Regression to determine configurations to test, balancing exploitation of detected relationships and exploration of the search space. The paper performs experimental evaluation on five SparkBench workloads with three datasets each. Although the proposed approach achieves small but statistically significant improvements in relation to the optimization methods compared in the study, the paper does not present performance or optimization baselines. The paper does not compare the achieved performance to the performance of default, or sensibly picked, Spark configurations. The results of the optimizers are also not compared to the best configurations found by naive methods such as Random Search.
- Adapts well-established optimization techniques from Bayesian Optimization and Statistical Learning to the Spark tuning problem, exploring better-performing regions of search spaces and decreasing tuning costs with respect to other tuning strategies
- Proposes an interesting high-dimensional optimization problem, with around 200 dimensions, but the actual studied problem targets a more tractable search space, one order of magnitude smaller, with only 44 parameters. The paper does not mention the number of possible configurations
- Performance improvements over established work seem to be statistically significant although small, but the paper does not compare the results with proper baselines. Performance is never compared to the sensible default Spark configurations for each workload, or to optimizations from more naive methods
- The paper does not compare the performance achieved by autotuning strategies to the performance achieved by more naive optimization strategies using the same budget, such as Random Search, which could also be based on LHS. Without optimization and performance baselines it is difficult to measure the effectiveness of the autotuners
- The comparisons between the proposed system and others are not balanced. The study allows the proposed system to reuse results from previous runs in two of the three datasets for each problem, which increases the effective experimental budget and the optimization time used by the technique. The author should modify figures to reflect this imbalance
- Page 8: “For ROBOTune, we treat each workload to be an unseen one for the
first dataset, and for the remaining two, ROBOTune uses cached parameters and
configurations through memoized sampling.”
- This means that the optimization time and budget comparisons presented in Figures 4 and 6 are not balanced between methods. If the proposed approach reuses measurements, its optimization costs must be updated accordingly
- Page 3: “Eventually, it [Bayesian Optimization] locates the global extremum
for the objective function, or nearly so within a limited number of
observations.”
- Finding the global optimum in a limited number of measurements is not a guarantee of Bayesian Optimization
- Page 5: “Furthermore, LHS is dimension agnostic, meaning that the number of
samples required is not tied to the dimensionality of the configuration
space.”
- Obtaining an effective search space covering using Latin Hypercube Sampling is tied to the problem dimension. A sample with only a handful of LHS samples in high-dimensional spaces suffers from the same issues than uniform samples, that is, most samples would be located in the grid regions near the “shell” of the search space
- Wrong date on page headers: “Conference’17, July 2017”
- Page 5: “Another way of computing feature importances is through tree-based
estimators. Unlike linear models, they apply to non-linear relationships.”
- Linear models can represent non-linear factor relationships, and are only restricted in the sense of linear, or linearizable, relationships between the estimated parameters
- Page 5: “We compare the coefficient of determination (R2) for two linear and
two tree-based models in Figure 2.”
- It is hard to compare the qualities of fit of the models in Figure 2 without knowing the underlying models for the Lasso and the Lasso + Ridge (ENet) approaches, since adding polynomial terms would increase flexibility of these approaches and consequently increase R^2
- Reference could be improved adding publisher information: “[11] L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. 1984.Classification and Regression Trees. Taylor & Francis.https://books.google.com/books?id=JwQx-WOmSyQC”
The response did not address my most concerning comments satisfactorily, and I recommend rejecting the paper. My comments regard mostly the experimental validation of the approach. Detailed comments are listed below.
> Comment-72E-2: Comparison to the default/naive? > > We compared our results to the default in Section-5.2. It resulted in frequent > Out-Of-Memory/Runtime errors. We need 5 runs for each dataset, each run taking > up to ~10 hours. Due to time constraints we chose to focus on established > methods.
A fair comparison baseline would be a configuration considered good enough for each dataset, that is, a configuration that was used in other work, or that is well established in the industry for this type of data, or at least that does not result in frequent errors. An equivalent example would be the Spark-equivalent of the “-O3” compiler option for the GCC compiler, which is a common baseline for works on compiler optimization. In that way, gains over sensible defaults could be properly assessed. Another fair option for a baseline would be picking the best configuration found by a random search using the same experimental budget. The paper should be resubmitted after sufficient data is collected.
> Comment-72E-3: Comparison not balanced. > > This could be a misunderstanding. In Section-5.3 we exclude the initial one-time > cost of parameter selection as this is a fixed cost. This overhead varies with > the number of datasets tuned, 3 used in our evaluation. Using more (5 or 10) > will change the overhead.
In my previous comment I argued under the impression that the proposed method leveraged cached results from previous runs. If that is the case, the comparisons between approaches must consider the experimental cost of caching these initial runs as part of the optimization cost. The cost of these initial evaluations, that are then cached and reused, should be included in the total optimization cost of the approach. If the method does not in fact reuse cached results in further optimization, please ignore my comment.
Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union 'data.frame': 23120 obs. of 9 variables: $ elements_number : int 3 2 4 2 2 2 2 4 4 3 ... $ y_component_number: int 3 2 1 1 1 2 2 2 4 1 ... $ vector_length : int 4 1 4 1 8 2 1 8 16 4 ... $ temporary_size : int 4 2 2 2 2 2 4 4 2 4 ... $ load_overlap : num 2 1 2 1 2 2 1 2 2 2 ... $ threads_number : int 64 128 64 256 128 128 128 64 128 32 ... $ lws_y : int 64 1 32 64 32 8 2 2 128 32 ... $ time_per_pixel : num 1.11e-08 1.58e-10 2.34e-09 1.39e-09 3.40e-09 ... $ row_number : int 1 2 3 4 5 6 7 8 9 10 ...
'data.frame': 23120 obs. of 9 variables: $ elements_number : int 3 2 4 2 2 2 2 4 4 3 ... $ y_component_number: int 3 2 1 1 1 2 2 2 4 1 ... $ vector_length : int 4 1 4 1 8 2 1 8 16 4 ... $ temporary_size : int 4 2 2 2 2 2 4 4 2 4 ... $ load_overlap : num 2 1 2 1 2 2 1 2 2 2 ... $ threads_number : int 64 128 64 256 128 128 128 64 128 32 ... $ lws_y : int 64 1 32 64 32 8 2 2 128 32 ... $ time_per_pixel : num 1.11e-08 1.58e-10 2.34e-09 1.39e-09 3.40e-09 ... $ row_number : int 1 2 3 4 5 6 7 8 9 10 ... [1] 18 Error in t.default(T) : argument is not a matrix
library(GGally)
library(dplyr)
library(grid)
library(patchwork)
library(gridExtra)
gpr_data <- read.csv("dopt_anova_experiments/data/gpr_nugget_best_points.csv") %>%
select(-slowdown, -time_per_pixel, -measurement_order,
-experiment_id, -X, -row_number) %>%
mutate(method = "GPRN")
# names(gpr_data) <- c("slowdown", "method", "point_number", "time_per_pixel")
gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_nosd_best_points.csv") %>%
select(-slowdown, -time_per_pixel, -measurement_order,
-experiment_id, -X, -row_number) %>%
mutate(method = "GPR")
# names(gpr_lin_data) <- c("slowdown", "method", "point_number", "time_per_pixel")
df_all_methods <- read.csv("dopt_anova_experiments/data/complete_1000.csv",
strip.white = TRUE, header = TRUE) %>%
select(-slowdown, -point_number, -vector_recompute, -time_per_pixel)
levels <- c("RS", "LHS", "GS", "GSR",
"GA","LM", "LMB", "LMBT",
"RQ", "DOPT", "DLM", "DLMT",
"GPR", "GPRN")
selected_methods <- c("RS", "LHS", "GS", "GSR",
"GA", "LM", "DLMT", "GPR", "GPRN")
df_all_methods <- df_all_methods %>%
mutate(load_overlap = as.numeric(as.factor(load_overlap))) %>%
bind_rows(gpr_data, gpr_lin_data) %>%
mutate(method = factor(method,
levels = levels)) %>%
filter(method %in% selected_methods) %>%
group_by(method) %>%
ungroup()
gprn <- ggpairs(df_all_methods %>%
filter(method == "GPRN") %>%
select(-method)) +
ggtitle("GPRN: Trend + Nugget")
gpr <- ggpairs(df_all_methods %>%
filter(method == "GPR") %>%
select(-method)) +
ggtitle("GPR: Trend")
dlmt <- ggpairs(df_all_methods %>%
filter(method == "DLMT") %>%
select(-method)) +
ggtitle("DLMT")
dlm <- ggpairs(df_all_methods %>%
filter(method == "LM") %>%
select(-method)) +
ggtitle("LM")
p1 <- grid.grabExpr(print(dlm))
p2 <- grid.grabExpr(print(dlmt))
p3 <- grid.grabExpr(print(gpr))
p4 <- grid.grabExpr(print(gprn))
grid.arrange(p1, p2, p3, p4, ncol = 2)
library(ggplot2)
library(dplyr)
#gpr_data <- read.csv("dopt_anova_experiments/data/gpr_nugget_best_points.csv") %>%
# gpr_data <- read.csv("gpr_dampening_sc30_d05_best_points.csv") %>%
# select(slowdown, method, measurement_order, time_per_pixel) %>%
# mutate(method = "GPRD")
#
# names(gpr_data) <- c("slowdown", "method", "point_number", "time_per_pixel")
#gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_nosd_best_points.csv") %>%
gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_03sd_nugget_best_points.csv") %>%
select(slowdown, method, measurement_order, time_per_pixel) %>%
mutate(method = "GPR")
names(gpr_lin_data) <- c("slowdown", "method", "point_number", "time_per_pixel")
df_all_methods <- read.csv("dopt_anova_experiments/data/complete_1000.csv",
strip.white = TRUE, header = TRUE)
levels <- c("RS", "LHS", "GS", "GSR", "GA", "LM", "RQ", "DLMT", "GPR")
labels <- c("RS", "LHS", "GS", "GSR", "GA", "LM", "QR", "DLMT", "GPR")
df_all_methods <- df_all_methods %>%
select(slowdown, method, point_number, time_per_pixel) %>%
bind_rows(gpr_lin_data) %>%
mutate(method = factor(method,
levels = levels,
labels = labels)) %>%
filter(method %in% labels) %>%
group_by(method) %>%
mutate(mean = mean(slowdown),
median = median(slowdown),
ci95 = 1.96 * sd(slowdown) / sqrt(n()),
max = max(slowdown)) %>%
ungroup()
ggplot(df_all_methods) +
facet_grid(method ~ .) +
theme_bw(base_size = 15) +
coord_cartesian(xlim = c(.9, 4),
ylim = c(0, 1000)) +
geom_histogram(aes(slowdown),
binwidth = .05,
fill = "gray48") +
scale_y_continuous(breaks = c(0, 1000),
labels = c("0", "1000")) +
geom_curve(aes(x = max + .1,
y = 500,
xend = max,
yend = 5),
arrow = arrow(length = unit(0.05, "npc")),
curvature = 0.3,
stat = "unique") +
geom_text(aes(x = max + .2,
y = 550,
label = "max"),
stat = "unique") +
geom_rect(aes(xmin = mean - ci95,
xmax = mean + ci95,
ymin = 0,
ymax = 1000,
fill = "red"),
alpha = 0.3,
stat = "unique") +
geom_vline(aes(xintercept = median),
color = "darkgreen",
linetype = 3,
stat = "unique") +
geom_vline(aes(xintercept = mean),
color = "red",
linetype = 2,
stat = "unique") +
labs(y = "Count",
x = "Slowdown compared to the optimal solution") +
scale_fill_discrete(name = "",
breaks = c("red"),
labels = c("Mean error")) +
ggtitle("") +
theme(legend.position = "none",
strip.background = element_rect(fill = "white"))
library(xtable)
library(dplyr)
gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_03sd_nugget_best_points.csv") %>%
select(slowdown, method, measurement_order, time_per_pixel) %>%
mutate(method = "GPR")
names(gpr_lin_data) <- c("slowdown", "method", "point_number", "time_per_pixel")
df_all_methods = read.csv("./dopt_anova_experiments/data/complete_1000.csv",
strip.white = T, header = T) %>%
select(slowdown, method, point_number) %>%
bind_rows(gpr_lin_data) %>%
filter(method %in% c("RS", "LHS", "GS", "GSR", "GA", "LM",
"DLMT", "RQ", "GPR")) %>%
mutate(method = factor(method,
levels = c("RS", "LHS", "GS", "GSR", "GA", "LM",
"RQ", "DLMT", "GPR"),
labels = c("Random Sampling (RS)",
"Latin Hypercube Sampling (LHS)",
"Greedy Search (GS)",
"Greedy Search w. Restart (GSR)",
"Genetic Algorithm (GA)",
"Linear Model (LM)",
"Quantile Regression (QR)",
"D-Opt., Linear Model w. Transform (DLMT)",
"Gaussian Process Regression w. EI (GPR)")))
summaries = df_all_methods %>%
group_by(method) %>%
summarise(method = method,
mean_slowdown = mean(slowdown),
min_slowdown = min(slowdown),
max_slowdown = max(slowdown),
mean_budget = mean(point_number),
max_budget = max(point_number),
.groups = "keep") %>%
distinct() %>%
ungroup()
cap <- paste("Slowdown and budget used by 7 optimization methods on",
"the Laplacian Kernel, using a budget of 125 points with 1000 repetitions")
x <- xtable(summaries,
caption = cap,
digits = 2,
label = "tab:gpu_laplacian_compare_budget")
align(x) <- xalign(x)
display(x) <- display(x)
header = paste(paste("Method", "Mean", "Min.", "Max.", "Mean", "Max.",
sep = " & "), "\\\\ \n \\midrule \n")
super_header = paste("\\toprule \n",
"& \\multicolumn{3}{c}{Slowdown} & \\multicolumn{2}{c}{Budget}",
"\\\\ \n")
bottom = "\\bottomrule\n"
print(x,
size = "\\small",
booktabs = TRUE,
include.rownames = FALSE,
include.colnames = FALSE,
hline.after = NULL,
add.to.row = list(pos = as.list(c(-1,
0,
7,
nrow(summaries))),
command = c(super_header,
header,
"\\rowcolor{red!25}",
bottom)),
math.style.exponents = TRUE,
table.placement = "b",
caption.placement = "top")
:
[1] 11.025812 11.525666 10.855048 11.342663 11.620696 11.578011 10.151870 [8] 10.311008 11.926957 9.512951
num [1:10] 10.03 10.53 9.86 10.34 10.62 ...
[1] 1.250000 1.253689 1.066590 1.476254
- https://www.jmp.com/support/help/en/15.2/index.shtml#page/jmp/gaussian-process-imse-optimal-designs.shtml#ww90928
- https://sites.ualberta.ca/~dwiens/home%20page/publist.htm
- https://sites.ualberta.ca/~dwiens/home%20page/pubs/lof%20discrete.pdf
- https://bookdown.org/rbg/surrogates/
- Convex function or Pseudo-convex: Briefly, when the learning rates η
decrease with an appropriate rate, and subject to relatively mild assumptions, stochastic gradient descent converges almost surely to a global minimum
- Lipschitz guarantee ?
- Many variants
(momentum to avoid zig-zag, Backtracking line search so that objective strictly decreases, conjugate gradient or Newton if hessian, Nelder-Mead if derivatives cannot be computed
- C^2: converges almost surely to a local minimum
- Meta-heuristics like Hill Climbing, Simulated Annealing (stochastic neighborhood exploration, related to https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm), Genetic Algorithm. -
- Constrained optimization (requires Lagrangian unless a variable change helps). Additional difficulty: avoiding to get stuck on the border and following it with ridiculous steps.
Convergence rate degrades with dimension and bad normalization.
- statistical learning: optimize the loss or maximum likelihood for model parameters. In general seeks a good prediction “everywhere”. Tries to evaluate uncertainty on parameters and predictions.
- “machine” learning: efficient optimization of the loss (i.e. Ex,θ[y_θ(x)-y(x)])
Statistical and Machine learning generally assume that the samples x are fixed inputs (observations vs. controled experiments), which raises difficulties regarding generalization/overfitting, discriminating causation from correlation, counterfactual analysis, etc.
Assuming a model for f, propose a set of x to measure to obtain the best estimates for the model parameters θ and the prediction.
- Linear models
- Fractional Designs
- Screening
- D-optimal designs
- Unconstrained models (gaussian process)
- LHS
Choose x_i depending on previous observations and obtain feedback. Same rules: interact with the world and get feedback, and try to find the “best” configuration.
- Online machine learning: regret from 1 to T, no discount. Often
relies on stochastic gradient.
- Special simple case = Bandit
- Stochastic: UCB/Thompson, log(T)
- Adversarial: EXP3, sqrt(T)
- Many variants Linear UCB, LinRel, Gaussian UCB
- Special simple case = Bandit
- Reinforcement learning: optimize (infinite discounted) regret,
generally through Markov Decision Process (i.e., dynamic
programming).
- Well studied for finite-state space MDPs
- Does not scale well in high dimension
In both cases, the balance between exploration and exploitation is lead by (temporally aggregated) regret.
- Reduce dimension by exploiting knowledge on the geometry of f (e.g., identify unsignificant parameters or parameters that need to be set to “obvious” values).
- EGO:
Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455-492. EGO targets the minimization of the expected deviation from the extremum of the studied function.
- EI, Kriging Believer (no guarantee) and GaussianUCB differ by the
- https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex
- https://cs.stackexchange.com/questions/14007/random-sampling-in-a-polygon
- List of publications on Maxmin Designs for Lack of Fit, Regression, etc
- Model-robust designs for nonlinearquantile regression
- Maximin power designs in testing lack of fit
- Designs for approximately linear regression: Two optimality properties of uniform designs
- Gaussian Process IMSE Optimal Designs
- Efficient Global Optimization (EGO): Bayesian Optimization
- EGO Original Paper
The experimental validation of the approaches studied during this thesis, namely Design of Experiments and Gaussian Process Regression, took longer than expected in the initial SPAPT benchmark. The large set of problems and number of configurable parameters slowed down a comprehensive and careful analysis of the relationships between parameters and performance for many problems.
The Gaussian Process Regression approach introduced further questions and restrictions for experimental validation, and the method was also applied to a different domain, the selection of bit precision for layers of Deep Neural Networks, where completing a single experimental run takes a full week.
I have already started my efforts towards writing the thesis, but our current assessment of the state of the work indicates that I most likely will not be able to finish before the end of the third year’s limit date. We intend to to finish writing the thesis by the end of December 2020, and consequently schedule the thesis defense to March 2021.
The contribution of this thesis is an evaluation of the effectiveness of established search, learning, and statistical methods for optimization, that were nonetheless not comprehensively applied and evaluated on the specific domain of optimization and performance modeling of computer programs.
We initially applied a Design of Experiments approach to model and optimize the performance of the code produced by frameworks for source-to-source transformation. One of the target frameworks was SPAPT, consisting of several programs and configurable parameters. We published our initial exploration and optimization of these programs in 2019, but the more extensive subsequent study, involving the explicit identification of significant parameters for each kernel, and the exploration of different modeling assumptions, such as quantile regression, took more time than expected.
In parallel to our Design of Experiments study, we also applied Gaussian Process Regression, a different category of optimization methods to a program optimization problem on a different application domain, namely, the adaptive selection of bit precision for each layer of Deep Neural Networks. This experiment involved the comparison of our optimization method with a reference Reinforce Learning algorithm. Experiments involve training and inference steps, which are necessary for model validation, and on the experimental scenarios such as the ones studied in this work, experiments take up to a week to complete. Our exploration of the particular conditions that make Gaussian Process Regression methods perform well in this sort of problem domain also took longer than expected. We studied sampling strategies that enabled local exploration of promising points, and compared the adequacy of different acquisition functions to determine such points. We have also revisited previous work on our source-to-source transformation benchmark on GPUs, applying and studying this optimization method in its context as well.
Work has started on thesis writing, and is planned to conclude by December 2020. The defense will then take place on March 2021.
- Vinicius’ payment
- Nathan’s payment
- Cotutelle extension agreement @ USP
- Attestation de financement jusqu’à Mars pour mon inscription à l’UGA
- Papers: Luciano, Carlos (iWAPT), Giuliano (?)
- Thesis
- HPE
'data.frame': 38 obs. of 19 variables: $ _use_fast_math : chr "on" "on" "on" "on" ... $ _opt_level_ : int 2 2 2 2 2 2 2 2 2 3 ... $ _gpu_architecture_ : chr "sm_50" "sm_50" "sm_50" "sm_50" ... $ _ftz_ : chr "false" "false" "false" "false" ... $ _force_store_cache_ : chr "cg" "cg" "cg" "cg" ... $ _force_load_cache_ : chr "cs" "cs" "cs" "cs" ... $ _allow_expensive_optimizations_: chr "false" "false" "false" "false" ... $ _fmad_ : chr "true" "true" "true" "true" ... $ _optimize_ : int 2 2 2 2 2 2 2 2 2 3 ... $ _preserve_relocs : chr "on" "on" "on" "on" ... $ _relocatable_device_code_ : chr "false" "false" "false" "false" ... $ _prec_div_ : chr "true" "true" "true" "true" ... $ _maxrregcount_ : int 32 32 32 32 32 32 32 32 32 63 ... $ _prec_sqrt_ : chr "false" "false" "false" "false" ... $ _no_align_double : chr "on" "on" "on" "on" ... $ response : num 3.32 3.32 3.32 3.32 3.33 ... $ experiment_id : chr "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" ... $ experiment_type : chr "screening_prediction" "screening_prediction" "screening_prediction" "screening_prediction" ... $ GPU : chr "Quadro M1200" "Quadro M1200" "Quadro M1200" "Quadro M1200" ... 'data.frame': 70 obs. of 6 variables: $ _opt_level_ : int 2 2 2 2 2 2 2 2 2 2 ... $ _gpu_architecture_: chr "sm_50" "sm_50" "sm_50" "sm_50" ... $ response : num 3.34 3.34 3.34 3.34 3.34 ... $ experiment_id : chr "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" ... $ experiment_type : chr "baseline" "baseline" "baseline" "baseline" ... $ GPU : chr "Quadro M1200" "Quadro M1200" "Quadro M1200" "Quadro M1200" ... 'data.frame': 793 obs. of 23 variables: $ dummy3 : int -1 -1 -1 -1 -1 -1 -1 -1 -1 1 ... $ _use_fast_math : chr "off" "off" "off" "off" ... $ _opt_level_ : int 2 2 2 2 2 2 2 2 2 2 ... $ _gpu_architecture_ : chr "sm_50" "sm_50" "sm_50" "sm_50" ... $ _ftz_ : chr "true" "true" "true" "true" ... $ _force_store_cache_ : chr "cg" "cg" "cg" "cg" ... $ _force_load_cache_ : chr "cs" "cs" "cs" "cs" ... $ _allow_expensive_optimizations_: chr "false" "false" "false" "false" ... $ _fmad_ : chr "true" "true" "true" "true" ... $ _optimize_ : int 2 2 2 2 2 2 2 2 2 3 ... $ dummy2 : int 1 1 1 1 1 1 1 1 1 -1 ... $ _preserve_relocs : chr "off" "off" "off" "off" ... $ _relocatable_device_code_ : chr "true" "true" "true" "true" ... $ _prec_div_ : chr "false" "false" "false" "false" ... $ dummy1 : int -1 -1 -1 -1 -1 -1 -1 -1 -1 1 ... $ _maxrregcount_ : int 63 63 63 63 63 63 63 63 63 32 ... $ dummy4 : int 1 1 1 1 1 1 1 1 1 -1 ... $ _prec_sqrt_ : chr "true" "true" "true" "true" ... $ _no_align_double : chr "off" "off" "off" "off" ... $ response : num 3.32 3.33 3.33 3.33 3.32 ... $ experiment_id : chr "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" ... $ experiment_type : chr "screening" "screening" "screening" "screening" ... $ GPU : chr "Quadro M1200" "Quadro M1200" "Quadro M1200" "Quadro M1200" ...4.1.1.1.2.2.1 Setup4.1.1.1.2.2.2 Plot
library(dplyr)
library(tidyr)
library(tibble)
library(broom)
library(ggplot2)
library(latex2exp)
bind_confint_lm <- function(model, experiment) {
bind_cols(tidy(model) %>%
rename(mean = estimate) %>%
arrange(term),
data.frame(confint(model)) %>%
rownames_to_column() %>%
rename(ci_inf = `X2.5..`,
ci_sup = `X97.5..`) %>%
arrange(rowname)) %>%
select(-rowname) %>%
mutate(experiment = experiment)
}
plot_df = bind_rows(bind_confint_lm(lm_gau, "GAU"),
bind_confint_lm(lm_hwl, "HWL"),
bind_confint_lm(lm_ndl, "NDL")) %>%
mutate(term = trimws(term, whitespace = "[_, `]")) %>%
filter(!grepl("dummy", term)) %>%
filter(!grepl("Intercept", term)) %>%
mutate(term = gsub("_`", " = ", term,
fixed = FALSE)) %>%
mutate(term = gsub("`", " = ", term,
fixed = FALSE)) %>%
mutate(term = gsub("allow_expensive_optimizations", "exp_opt", term,
fixed = FALSE)) %>%
mutate(term = gsub("gpu_architecture", "arch", term,
fixed = FALSE)) %>%
mutate(term = gsub("no_align_double", "no_align_db", term,
fixed = FALSE)) %>%
mutate(term = gsub("relocatable_device_code", "rel_dev_code", term,
fixed = FALSE))
p_effects_quadro = ggplot() +
geom_point(data = plot_df,
aes(x = term,
y = mean),
size = 3) +
geom_errorbar(data = plot_df,
aes(x = term,
ymin = ci_inf,
ymax = ci_sup),
width = 0.4) +
geom_hline(data = plot_df,
aes(x = term),
linetype = 2,
yintercept = 0.0) +
ylab(TeX("Mean Effect Estimate $\\pm$ 95% CI")) +
ggtitle("Quadro M1200") +
facet_grid(experiment ~ .,
scales = "free_y") +
theme_bw(base_size = 24) +
theme(plot.title = element_text(size = 19),
strip.background = element_rect(fill = "white"),
axis.title.x = element_blank(),
axis.text.x = element_text(size = 18,
angle = 30,
hjust = 1,
vjust = 1))
p_effects_quadro
'data.frame': 30 obs. of 19 variables: $ _use_fast_math : chr "on" "on" "on" "on" ... $ _opt_level_ : int 2 2 2 2 2 2 2 2 2 2 ... $ _gpu_architecture_ : chr "sm_50" "sm_50" "sm_50" "sm_50" ... $ _ftz_ : chr "true" "true" "true" "true" ... $ _force_store_cache_ : chr "cg" "cg" "cg" "cg" ... $ _force_load_cache_ : chr "cg" "cg" "cg" "cg" ... $ _allow_expensive_optimizations_: chr "false" "false" "false" "false" ... $ _fmad_ : chr "false" "false" "false" "false" ... $ _optimize_ : int 3 3 3 3 3 3 3 3 3 3 ... $ _preserve_relocs : chr "off" "off" "off" "off" ... $ _relocatable_device_code_ : chr "false" "false" "false" "false" ... $ _prec_div_ : chr "false" "false" "false" "false" ... $ _maxrregcount_ : int 32 32 32 32 32 32 32 32 32 32 ... $ _prec_sqrt_ : chr "false" "false" "false" "false" ... $ _no_align_double : chr "on" "on" "on" "on" ... $ response : num 1.38 1.28 1.28 1.3 1.29 ... $ experiment_id : chr "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" ... $ experiment_type : chr "screening_prediction" "screening_prediction" "screening_prediction" "screening_prediction" ... $ GPU : chr "Titan X" "Titan X" "Titan X" "Titan X" ... 'data.frame': 60 obs. of 6 variables: $ _opt_level_ : int 2 2 2 2 2 2 2 2 2 2 ... $ _gpu_architecture_: chr "sm_50" "sm_50" "sm_50" "sm_50" ... $ response : num 1.33 1.33 1.32 1.31 1.33 ... $ experiment_id : chr "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" ... $ experiment_type : chr "baseline" "baseline" "baseline" "baseline" ... $ GPU : chr "Titan X" "Titan X" "Titan X" "Titan X" ... 'data.frame': 600 obs. of 23 variables: $ dummy3 : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... $ _use_fast_math : chr "off" "off" "off" "off" ... $ _opt_level_ : int 2 2 2 2 2 2 2 2 2 2 ... $ _gpu_architecture_ : chr "sm_50" "sm_50" "sm_50" "sm_50" ... $ _ftz_ : chr "true" "true" "true" "true" ... $ _force_store_cache_ : chr "cg" "cg" "cg" "cg" ... $ _force_load_cache_ : chr "cs" "cs" "cs" "cs" ... $ _allow_expensive_optimizations_: chr "false" "false" "false" "false" ... $ _fmad_ : chr "true" "true" "true" "true" ... $ _optimize_ : int 2 2 2 2 2 2 2 2 2 2 ... $ dummy2 : int 1 1 1 1 1 1 1 1 1 1 ... $ _preserve_relocs : chr "off" "off" "off" "off" ... $ _relocatable_device_code_ : chr "true" "true" "true" "true" ... $ _prec_div_ : chr "false" "false" "false" "false" ... $ dummy1 : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... $ _maxrregcount_ : int 63 63 63 63 63 63 63 63 63 63 ... $ dummy4 : int 1 1 1 1 1 1 1 1 1 1 ... $ _prec_sqrt_ : chr "true" "true" "true" "true" ... $ _no_align_double : chr "off" "off" "off" "off" ... $ response : num 1.35 1.3 1.3 1.33 1.3 ... $ experiment_id : chr "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" ... $ experiment_type : chr "screening" "screening" "screening" "screening" ... $ GPU : chr "Titan X" "Titan X" "Titan X" "Titan X" ...4.1.1.1.2.3.1 Setup4.1.1.1.2.3.2 Plot
library(dplyr)
library(tidyr)
library(tibble)
library(broom)
library(ggplot2)
library(latex2exp)
bind_confint_lm <- function(model, experiment) {
bind_cols(tidy(model) %>%
rename(mean = estimate) %>%
arrange(term),
data.frame(confint(model)) %>%
rownames_to_column() %>%
rename(ci_inf = `X2.5..`,
ci_sup = `X97.5..`) %>%
arrange(rowname)) %>%
select(-rowname) %>%
mutate(experiment = experiment)
}
plot_df = bind_rows(bind_confint_lm(lm_gau, "GAU"),
bind_confint_lm(lm_hwl, "HWL"),
bind_confint_lm(lm_ndl, "NDL")) %>%
mutate(term = trimws(term, whitespace = "[_, `]")) %>%
filter(!grepl("dummy", term)) %>%
filter(!grepl("Intercept", term)) %>%
mutate(term = gsub("_`", " = ", term,
fixed = FALSE)) %>%
mutate(term = gsub("`", " = ", term,
fixed = FALSE)) %>%
mutate(term = gsub("allow_expensive_optimizations", "exp_opt", term,
fixed = FALSE)) %>%
mutate(term = gsub("gpu_architecture", "arch", term,
fixed = FALSE)) %>%
mutate(term = gsub("no_align_double", "no_align_db", term,
fixed = FALSE)) %>%
mutate(term = gsub("relocatable_device_code", "rel_dev_code", term,
fixed = FALSE))
p_effects_titan = ggplot() +
geom_point(data = plot_df,
aes(x = term,
y = mean),
size = 3) +
geom_errorbar(data = plot_df,
aes(x = term,
ymin = ci_inf,
ymax = ci_sup),
width = 0.4) +
geom_hline(data = plot_df,
aes(x = term),
linetype = 2,
yintercept = 0.0) +
ylab(TeX("Mean Effect Estimate $\\pm$ 95% CI")) +
ggtitle("Titan X") +
facet_grid(experiment ~ .,
scales = "free_y") +
theme_bw(base_size = 24) +
theme(plot.title = element_text(size = 19),
strip.background = element_rect(fill = "white"),
axis.title.x = element_blank(),
axis.text.x = element_text(size = 18,
angle = 30,
hjust = 1,
vjust = 1))
p_effects_titan
tibble [30 × 5] (S3: tbl_df/tbl/data.frame) $ response : num [1:30] 0.12 0.105 0.104 0.105 0.104 ... $ experiment_type: chr [1:30] "OT" "OT" "OT" "OT" ... $ experiment_id : chr [1:30] "NDL" "NDL" "NDL" "NDL" ... $ mean_r : num [1:30] 0.106 0.106 0.106 0.106 0.106 ... $ ci95 : num [1:30] 0.00307 0.00307 0.00307 0.00307 0.00307 ...
tibble [30 × 5] (S3: tbl_df/tbl/data.frame) $ response : num [1:30] 0.215 0.193 0.19 0.208 0.154 ... $ experiment_type: chr [1:30] "OT" "OT" "OT" "OT" ... $ experiment_id : chr [1:30] "NDL" "NDL" "NDL" "NDL" ... $ mean_r : num [1:30] 0.186 0.186 0.186 0.186 0.186 ... $ ci95 : num [1:30] 0.0108 0.0108 0.0108 0.0108 0.0108 ...
library(ggplot2)
df = read.csv("autotuning_screening_experiment/data/all_speedups.csv") %>%
mutate(experiment_type = factor(experiment_type,
levels = c("Screening",
"OpenTuner")))
ggplot() +
geom_point(data = df,
size = 7,
alpha = 1,
aes(y = experiment_id,
x = speedup,
color = GPU),
position = position_dodge(width = 0.8),
show.legend = TRUE) +
geom_vline(data = df,
aes(y = experiment_id),
linetype = 2,
xintercept = 1) +
geom_linerange(data = df,
aes(y = experiment_id,
xmin = speedup,
xmax = 1.0,
color = GPU),
linetype = 2,
position = position_dodge(width = 0.8),
show.legend = FALSE) +
facet_grid(. ~ experiment_type) +
xlab("Speedup vs. -O3") +
scale_color_brewer(palette = "Dark2") +
theme_bw(base_size = 37) +
theme(axis.title.y = element_blank(),
legend.position = c(0.85, 0.9),
legend.background = element_blank(),
legend.title = element_blank(),
legend.text = element_text(size = 33),
strip.text = element_text(size = 33),
strip.background = element_rect(fill = "transparent"))
tibble [30 × 5] (S3: tbl_df/tbl/data.frame) $ response : num [1:30] 0.104 0.104 0.104 0.105 0.105 ... $ experiment_type: chr [1:30] "OT:200" "OT:200" "OT:200" "OT:200" ... $ experiment_id : chr [1:30] "NDL" "NDL" "NDL" "NDL" ... $ mean_r : num [1:30] 0.104 0.104 0.104 0.104 0.104 ... $ ci95 : num [1:30] 0.000213 0.000213 0.000213 0.000213 0.000213 ...
tibble [30 × 5] (S3: tbl_df/tbl/data.frame) $ response : num [1:30] 0.201 0.18 0.174 0.18 0.164 ... $ experiment_type: chr [1:30] "OT:200" "OT:200" "OT:200" "OT:200" ... $ experiment_id : chr [1:30] "NDL" "NDL" "NDL" "NDL" ... $ mean_r : num [1:30] 0.177 0.177 0.177 0.177 0.177 ... $ ci95 : num [1:30] 0.00714 0.00714 0.00714 0.00714 0.00714 ...
library(ggplot2)
library(patchwork)
p_effects_titan / p_effects_quadro
library(ggplot2)
library(patchwork)
p_timing_ot_quadro = p_timing_quadro +
geom_jitter(data = ot_df,
size = 4,
shape = 1,
alpha = 1,
aes(x = experiment_type,
y = response,
color = experiment_type),
width = 0.3,
height = 0.0,
show.legend = FALSE) +
geom_errorbar(data = ot_df,
aes(x = experiment_type,
ymin = mean_r - ci95,
ymax = mean_r + ci95,
color = experiment_type),
show.legend = FALSE,
width = 0.3) +
geom_point(data = ot_df,
size = 4,
aes(x = experiment_type,
y = mean_r,
color = experiment_type),
show.legend = FALSE) +
geom_jitter(data = ot_longer_df,
size = 4,
shape = 1,
alpha = 1,
aes(x = experiment_type,
y = response,
color = experiment_type),
width = 0.3,
height = 0.0,
show.legend = FALSE) +
geom_errorbar(data = ot_longer_df,
aes(x = experiment_type,
ymin = mean_r - ci95,
ymax = mean_r + ci95,
color = experiment_type),
show.legend = FALSE,
width = 0.3) +
geom_point(data = ot_longer_df,
size = 4,
aes(x = experiment_type,
y = mean_r,
color = experiment_type),
show.legend = FALSE)
p_timing_ot_titan = p_timing_titan +
geom_jitter(data = titan_ot_df,
size = 4,
shape = 1,
alpha = 1,
aes(x = experiment_type,
y = response,
color = experiment_type),
width = 0.3,
height = 0.0,
show.legend = FALSE) +
geom_errorbar(data = titan_ot_df,
aes(x = experiment_type,
ymin = mean_r - ci95,
ymax = mean_r + ci95,
color = experiment_type),
show.legend = FALSE,
width = 0.3) +
geom_point(data = titan_ot_df,
size = 4,
aes(x = experiment_type,
y = mean_r,
color = experiment_type),
show.legend = FALSE) +
geom_jitter(data = titan_ot_longer_df,
size = 4,
shape = 1,
alpha = 1,
aes(x = experiment_type,
y = response,
color = experiment_type),
width = 0.3,
height = 0.0,
show.legend = FALSE) +
geom_errorbar(data = titan_ot_longer_df,
aes(x = experiment_type,
ymin = mean_r - ci95,
ymax = mean_r + ci95,
color = experiment_type),
show.legend = FALSE,
width = 0.3) +
geom_point(data = titan_ot_longer_df,
size = 4,
aes(x = experiment_type,
y = mean_r,
color = experiment_type),
show.legend = FALSE)
p_timing_ot_titan | p_timing_ot_quadro
library(ggplot2)
library(patchwork)
p_speedup_titan | p_speedup_quadro
library(ggplot2)
library(primes)
prime_list = generate_n_primes(10)
combinations = expand.grid(p = prime_list, q = prime_list)
combinations$is_prime = is_prime((as.double(combinations$p) ^
as.double(combinations$q)) + 1.0)
ggplot() +
geom_tile(data = combinations,
aes(x = as.factor(p),
y = as.factor(q),
fill = is_prime),
show.legend = FALSE) +
scale_fill_brewer(palette = "Greys") +
theme_void()
Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union Warning message: replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’
library(dplyr)
library(ggplot2)
library(latex2exp)
library(stringr)
plot_df = speedups_df %>%
filter(id == "Rodinia",
gpu != "Tesla-K40-nvcc65",
kernel != "ParticleFilterFloat",
kernel != "ParticleFilterNaive") %>%
mutate(kernel = gsub("Pathfinder", "pathfinder", kernel, fixed = TRUE),
kernel = factor(kernel,
levels = c("b+tree", "backprop", "bfs", "gaussian",
"heartwall", "hotspot", "kmeans",
"lavaMD", "lud", "myocyte", "needle",
"pathfinder"),
labels = c("BPT", "BCK", "BFS", "GAU", "HWL",
"HOT", "KMN", "LMD", "LUD", "MYO",
"NDL", "PTF")))
ggplot() +
geom_point(data = plot_df,
aes(y = kernel,
x = speedup,
color = gpu),
size = 5,
position = position_dodge(width = 0.8)) +
geom_linerange(data = plot_df,
aes(y = kernel,
xmin = speedup,
xmax = 1.0,
color = gpu),
linetype = 2,
position = position_dodge(width = 0.8)) +
scale_color_brewer(palette = "Set1") +
geom_vline(xintercept = 1,
linetype = 2,
size = 0.8) +
xlab(TeX("Speedup against \\textit{-O2}")) +
theme_bw(base_size = 30) +
theme(legend.position = c(0.9, 0.1),
legend.title = element_blank(),
legend.background = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_text(face = "italic"))
- Processor: Xeon E5-2630V3
- Single Core Turbo Clock: 3.2e9 Hz
- DGEMV Input Size: M = 20000, N = 20000
- Run MKL with 20000 at Xeon 2630v3
- Verify if matrix initialization is accounted for
Finalement, EGO explore bien mais ne gagne pas beaucoup par rapport au sampling initial. En même temps, il n’y a p-e pas grand chose à gagner.
Xeon xeon_e5_2630_v3. https://ranker.sisoftware.co.uk/top_device.php?q=c2ffc9f1d7bad7eadafc8eb383a5ccf1c1e78fb282a4dce1d1f792f7cafdcbfddba895a492&l=en annonce 26.14GFLOPS mais bon..
- 2 inst/cycle * 2 (FMA) = 4 flops/cycle * la taille du vecteur (4 pour l’AVX-2, 265 bits)
- 2.4 GHz * 8 cores -> 307.2 GFLOPs
- 2.8 GHz (Turbo all cores) * 8 cores-> 358.4 Gflop https://www.cpu-monkey.com/fr/compare_cpu-intel_xeon_e5_2630_v3-492-vs-intel_xeon_e3_1225_v3-75
- 3.2 GHz (1 core)
La bonne ref pour tout ça, c’est : https://ark.intel.com/content/www/us/en/ark/products/83356/intel-xeon-processor-e5-2630-v3-20m-cache-2-40-ghz.html
Pour “mm” (DGEMM):
Les matrices font 2,000 de côté
Une mult+une add donc:
bon, finalement,c ‘est un DGEMV donc on peut toujours calculer la borne flop
2 types de RAM. Si c’est un serveur avec de la RAM 1600, on perd 15% donc on est loin de la crète.
Il faudrait se comparer à la MKL pour bien faire mais il est possible que le kernel paramétré de SPAPT soit incapable d’atteindre cette crète (qui dans le cas du DGEMV devrait être proche d’un memcopy).
Orio utilise gcc en interne;
On pourrait réessayer avec DGEMM mais on peut avoir le même problème. Si leur kernel paramétré est mal foutu on sera loin de la crète…
Le set de benchmark est mal fichu (badly designed) au final. :(
[1] 52.08333
[1] 8e+08
[1] 400020000
[1] 0.015625
[1] 0.05423729
Time taken: max(mem, cpu)
[1] 0.054
[1] 0.8677966
[1] 20.92593
library(ggplot2)
library(dplyr)
library(directlabels)
library(ggrepel)
library(latex2exp)
roofline <- function(peak_perf, peak_mem, arithm_intensity) {
return(min(peak_mem * arithm_intensity, peak_perf))
}
perf_gemv <- function(size) {
return(((size ^ 2) * 2) / 1e9)
}
mem_gemv <- function(size) {
return((((size ^ 2) + size) * 8) / 1e9)
}
peak_perf_8core = 307.2
freq_8core = 2.4
freq_1core = 3.2
cores = 8
peak_perf_1core = ((peak_perf_8core / freq_8core) *
freq_1core) / cores
peak_mem = 59
arithm_intensity = seq(0, 2, length.out = 1000)
roofline_df = data.frame(arithmetic_intensity = arithm_intensity,
performance = sapply(arithm_intensity,
function(x) {
roofline(peak_perf_1core, peak_mem, x) }))
roofline_df = roofline_df %>%
mutate(memory_bound = factor(performance < peak_perf_1core,
levels = c(TRUE, FALSE),
labels = c("Memory Bound",
"Compute Bound")))
gemv_size = 20000
gemv_perf = perf_gemv(gemv_size)
gemv_mem = mem_gemv(gemv_size)
gemv_peak_perf = max(gemv_perf / peak_perf_1core,
gemv_mem / peak_mem)
fastest_time = gemv_perf / 1.160198
gemv_df = data.frame(id = "GEMV",
peak_theoretical = gemv_peak_perf,
arithmetic_intensity = gemv_perf / gemv_mem,
peak_achieved = fastest_time)
ggplot() +
geom_line(data = roofline_df,
aes(x = arithmetic_intensity,
y = performance,
color = memory_bound),
size = 2.4) +
geom_dl(data = roofline_df %>%
filter(performance < peak_perf_1core),
aes(x = arithmetic_intensity,
y = performance,
color = memory_bound,
label = paste(peak_mem, "GB/s")),
method = list("smart.grid", rot = 53, cex = 2.5, vjust = 0)) +
geom_dl(data = roofline_df %>%
filter(performance >= peak_perf_1core),
aes(x = arithmetic_intensity,
y = performance,
color = memory_bound,
label = paste(peak_perf_1core, "GFLOPs/s")),
method = list("smart.grid", rot = 0, cex = 2.5, vjust = 2.7)) +
geom_point(data = gemv_df,
aes(x = arithmetic_intensity,
y = arithmetic_intensity * peak_mem),
size = 6) +
geom_text_repel(data = gemv_df,
aes(x = arithmetic_intensity,
y = arithmetic_intensity * peak_mem,
label = "Theoretical GEMV (N = 2e4)"),
size = 11,
hjust = -0.15) +
geom_point(data = gemv_df,
aes(x = arithmetic_intensity,
y = peak_achieved),
size = 6) +
geom_text_repel(data = gemv_df,
aes(x = arithmetic_intensity,
y = peak_achieved,
label = "Best Achieved GEMV (20x slower)"),
size = 11,
hjust = -0.15) +
ylab("Single Core GFLOPs/s") +
xlab("FLOPs/byte") +
scale_color_brewer(palette = "Dark2") +
theme_bw(base_size = 28) +
theme(legend.position = c(0.86, 0.1),
legend.background = element_blank(),
legend.title = element_blank(),
legend.text = element_text(size = 30))
High Performance Computing has been a cornerstone of collective scientific and industrial progress for at least five decades. Paying the cost of increased complexity, software and hardware engineering advances continue to overcome several challenges on the way of the sustained performance improvements observed during the last fifty years. This mounting complexity means that reaching the advertised hardware performance for a given program requires not only expert knowledge of a given hardware architecture, but also mastery of programming models and languages for parallel and distributed computing.
If we state performance optimization problems as search or learning problems, by converting implementation and configuration choices to parameters which might affect performance, we can draw and adapt proven methods from search, mathematical optimization and statistics. The effectiveness of these adapted methods on autotuning problems varies greatly, and hinges on practical and mathematical properties of the problem and the corresponding search space.
When adapting methods for autotuning, we must face challenges emerging from practical properties such as restricted time and cost budgets, constraints on feasible parameter values, and the need to mix categorical, continuous, and discrete parameters. To achieve useful results, we must also choose methods that make hypotheses compatible with problem search spaces, such as the existence of discoverable, or at least exploitable, relationships between parameters and performance. Choosing an autotuning method requires determining a balance between the exploration of a problem, when we would seek to discover and explain relationships between parameters and performance, and the exploitation of the best optimizations we can find, when we would seek only to minimize performance.
The effectiveness of search heuristics on autotuning can be limited \cite{seymour2008comparison,balaprakash2011can,balaprakash2012experimental}, between other factors, by underlying hypotheses about the search space, such as the reachability of the global optimum and the smoothness of search space surfaces, which are frequently not respected. The derivation of relationships between parameters and performance from search heuristic optimizations is greatly hindered, if not rendered impossible, by the biased way these methods explore parameters. Some parametric learning methods, such as Design of Experiments, are not widely applied to autotuning. These methods perform structured parameter exploration, and can be used to build and validate performance models, generating transparent and cost-effective optimizations \cite{mametjanov2015autotuning,bruel2019autotuning}. Other methods from the parametric family are more widely used, such as Bandit Algorithms \cite{xu2017parallel}. Nonparametric learning methods, such as Decision Trees \cite{balaprakash2016automomml} and Gaussian Process Regression \cite{parsa2019pabo}, are able to reduce model bias greatly, at the expense of increased prediction variance. Figure \ref{fig:tree} categorizes some autotuning methods according to some of the key hypotheses and branching questions underlying each method.
During this thesis I have adapted and studied the effectiveness of different search heuristics and statistical learning methods on optimizing performance on several autotuning domains. During the beginning of my PhD at the University of São Paulo (USP), I have published a paper on optimizing the configuration of the CUDA compiler \cite{bruel2017autotuning}, where we have reached up to 4 times performance improvement in comparison with a high-level compiler optimization. In collaboration with researchers from Hewlett Packard Enterprise (HPE) in Palo Alto, I wrote a paper on the autotuning of a compiler for High-Level Synthesis for FPGAs \cite{bruel2017autotuninghls}, where we have reached, on average, 25% improvements on performance, size, and complexity of designs.
At the end of 2017, I joined the cotutelle PhD program at the University of Grenoble Alpes (UGA) and became a member of the POLARIS Inria team, where I applied Design of Experiments to the autotuning of a source-to-source transformation compiler \cite{bruel2019autotuning}, where we showed we can achieve significant speedup by exploiting search space structure using a strict budget. I also have collaborated with HPE on another paper, providing an analysis of the applicability of autotuning methods to a Hardware-Software Co-design problem \cite{bruel2017generalize}. During my Teaching Assistant internships, I have published one paper \cite{bruel2017openmp} on parallel programming teaching, and collaborated on another \cite{goncalves2016openmp}, where we showed that teaching lower level programming models, despite being more challenging at first, provides a stronger core understanding.
I continue to collaborate with HPE researchers on the application of autotuning methods to optimize Neural Networks, hardware accelerators for Deep Learning, and algorithms for dealing with network congestion. With my advisors, I currently manage 1 undergraduate and 4 masters students, who are applying the statistical learning autotuning methods I studied and adapted to different domains in the context of a joint USP/HPE research project. I am strongly motivated to continue pursuing a career on Computer Science research, aiming to produce rigorous and value-adding contributions. I hereby submit my thesis proposal and application to the Microsoft Latin America PhD Award.
High Performance Computing has been a cornerstone of collective scientific and industrial progress for at least five decades. Paying the cost of increased complexity, software and hardware engineering advances continue to overcome several challenges on the way of the sustained performance improvements observed during the last fifty years. This mounting complexity means that reaching the advertised hardware performance for a given program requires not only expert knowledge of a given hardware architecture, but also mastery of programming models and languages for parallel and distributed computing.
If we state performance optimization problems as search or learning problems, by converting implementation and configuration choices to parameters which might affect performance, we can draw and adapt proven methods from search, mathematical optimization and statistics. The effectiveness of these adapted methods on autotuning problems varies greatly, and hinges on practical and mathematical properties of the problem and the corresponding search space.
When adapting methods for autotuning, we must face challenges emerging from practical properties such as restricted time and cost budgets, constraints on feasible parameter values, and the need to mix categorical, continuous, and discrete parameters. To achieve useful results, we must also choose methods that make hypotheses compatible with problem search spaces, such as the existence of discoverable, or at least exploitable, relationships between parameters and performance. Choosing an autotuning method requires determining a balance between the exploration of a problem, when we would seek to discover and explain relationships between parameters and performance, and the exploitation of the best optimizations we can find, when we would seek only to minimize performance.
During this thesis I have adapted and studied the effectiveness of different search heuristics and statistical learning methods on optimizing performance on several autotuning domains. During the beginning of my PhD at the University of São Paulo (USP), I have published a paper on optimizing the configuration of the CUDA compiler \cite{bruel2017autotuning}, where we have reached up to 4 times performance improvement in comparison with a high-level compiler optimization. In collaboration with researchers from Hewlett Packard Enterprise (HPE) in Palo Alto, I wrote a paper on the autotuning of a compiler for High-Level Synthesis for FPGAs \cite{bruel2017autotuninghls}, where we have reached, on average, 25% improvements on performance, size, and complexity of designs.
At the end of 2017, I joined the cotutelle PhD program at the University of Grenoble Alpes (UGA) and became a member of the POLARIS Inria team, where I applied Design of Experiments to the autotuning of a source-to-source transformation compiler \cite{bruel2019autotuning}, where we showed we can achieve significant speedup by exploiting search space structure using a strict budget. I also have collaborated with HPE on another paper, providing an analysis of the applicability of autotuning methods to a Hardware-Software Co-design problem \cite{bruel2017generalize}.
I continue to collaborate with HPE researchers on the application of autotuning methods to optimize Neural Networks, hardware accelerators for Deep Learning, and algorithms for dealing with network congestion. With my advisors, I currently manage 1 undergraduate and 4 masters students, who are applying the statistical learning autotuning methods I studied and adapted to different domains in the context of a joint USP/HPE research project. I am strongly motivated to continue pursuing a career on Computer Science research, aiming to produce rigorous and value-adding contributions. I hereby submit my thesis proposal and application to the Microsoft Latin America PhD Award.
- Registration OK
- Project, lab, advisor?
- Can it be Arnaud, or someone at LIG?
- Almost 2x peak when using march=native
- Despite that, no improvement in the best configuration for dgemv kernel
- Around 20% improvement for the -O3 kernel version
- Good argument that the kernel is not capable of leveraging the parameters
- Looking at a specific semi-automatic example
- Factors were not always eliminated
- Later, experiments were automated