Skip to content

Latest commit

 

History

History
35637 lines (30776 loc) · 1.45 MB

journal.org

File metadata and controls

35637 lines (30776 loc) · 1.45 MB

Pedro’s Journal

1 Setup

1.1 Julia

Pkg.add("Plots")
Pkg.add("Lint")
Pkg.add("Gadfly")
Pkg.add("ProfileView")
Pkg.add("CSV")
Pkg.add("StatsBase")
Pkg.add("StatsModels")
Pkg.add("GLM")
Pkg.add("RDatasets")
Pkg.add("IterTools")
Pkg.add("Missings")
Pkg.add("RCall")
Pkg.add("DataFrames")

Pkg.update()

1.2 R

Installing R dependencies:

install.packages(c("ggplot2", "dplyr", "tidyr", "rjson", "GGally",
                 "plotly", "rPref", "pracma", "FrF2", "AlgDesign",
                 "quantreg"),
                 repos = "https://mirror.ibcp.fr/pub/CRAN/")

1.3 Modifying & Analysing the FPGA Data Set

Cloning and updating the legup-tuner repository:

git clone https://github.com/phrb/legup-tuner.git || (cd legup-tuner && git pull)

Export your path to repository_dir variable:

pwd | tr -d "\n"

1.4 Updating & Cloning Repositories

1.4.1 GPU Autotuning Screening Experiment

git clone https://github.com/phrb/autotuning_screening_experiment.git || (cd autotuning_screening_experiment && git pull)

2 2019

2.1 August

2.1.1 [2019-08-05 Mon]

2.1.1.1 Performance Modeling and Optimization of the bicg SPAPT kernel

We are also interested in exploring new design construction metrics, in order to understand which factors are best estimated at each step and possibly discard designs not capable of estimating parameters of interest.

We chose to start with the bicg SPAPT kernel. It was one of the kernels where our approach achieved similar-performing solutions using less experiments, in comparison with a uniform random sampling approach, across 10 repetitions of an experiment with a fixed budget of 400 experiments and a limit of 4 iterations, or steps, of our approach.

The following sections describe our ongoing exploration of the performance modeling and optimization of the bicg kernel.

2.1.1.1.1 Original Results
2.1.1.1.1.1 Update File
2.1.1.1.2 Removing Cubic Terms
2.1.1.1.2.1 Iterations that found the Best Configuration

Figure fig:best-found shows the iterations where the best configuration of each of the 10 runs was found, for uniform sampling and for our approach.

The results are the same as the ones presented in the paper, but the data are not the same. The data for the figures in this report were obtained with a new set of 10 repetitions the same experiment, but using a performance model with only linear and quadratic terms, removing the cubic terms used in the paper’s experiments.

library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
                                                 "lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
                                                        "lu", "mvt", "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 4, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    #geom_label_repel(data = . %>% group_by(application) %>%
    #                              filter(technique == "RS") %>%
    #                              filter(best_iteration == min(best_iteration)),
    #                 aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
    geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
    #scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
    scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_size_manual("", values = 0.9) +
    #annotation_logticks(sides = "b") +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 30) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          #text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    #scale_color_brewer(palette = "Set1")
    scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 30) +
                theme(text = element_text(family = "serif"),
                      legend.position = c(0.2, 0.5),
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}


mixed_plot_data = it_data %>%
    mutate(technique = gsub(pattern = "DLMT",
                            replacement = "No Cubic Terms",
                            x = technique, fixed = TRUE))

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)

The results are very similar from the ones in the paper, but our approach found slightly better configurations slightly faster. Random sampling also found a much better configuration much faster than before in one experiment.

2.1.1.1.2.2 Eliminated Factors and Fixed Values

Figure fig:eliminated-terms shows the count of model terms that were eliminated at each color-coded DLMT step. As before, we see that OMP and SCR were the most eliminated factors, especially on the first step.

base_size <- 25

ggplot(subset(eliminated_terms, removed_variables != ""),
       aes(x = removed_variables,
           fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = c(0.8, 0.95),
        legend.background = element_blank(),
        legend.direction = "horizontal",
        axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(name = "Step", palette = "Set2")
2.1.1.1.2.3 Fixed Factors & Values in the Explored Spaces

Figure fig:encpnc-ff-ss-be shows the factors that were identified as significant by ANOVA, with significance threshold of $0.01$, at each DLMT step. Identifying a factor removes the corresponding parameter from the model, regardless of the model term that was identified. Figures are grouped by each of the 10 experiments. Figure headers identify each run, and correspond to the names of the Grid5000 machines where experiments were run.

It is interesting that only parasilo-19, parasilo-20, and parasilo-9 eliminated any factor other than OMP and SCR on the first step where any factor was eliminated, and also that those runs fixed the factor RT_I to the same value.

We can also see that OMP seems to be the parameter behind the most extreme changes in the execution time of tested configurations. This is specially clear at parasilo-13 and parasilo-10, where the explored configurations have relatively high execution time until after step 3, where OMP is fixed.

A very slight worsening of execution times, an increase, that is, can be seen at the fourth step at parasilo-11, after RT_I was fixed. The minimum execution time seems to be higher than in the third step.

./img/explored_space_no_cubic.pdf

/tmp/babel-cKpmYZ/figure1ZhJ27.pdf

./img/models_no_cubic.pdf

./img/models_fixed_omp_no_cubic.pdf

/tmp/babel-cKpmYZ/figurePIKjIp.pdf

/tmp/babel-Bco3bb/figureva5tXz.pdf

2.1.1.1.2.4 Predicted Performance at Each Step

Figure fig:nc-predicted-best compares the performance predicted by the fitted model, represented by the green line, with the performance of the best point found at each design, represented by the blue line. The red line marks the best point found so far.

./img/predictions_no_cubic.pdf

It is only at the fourth step of parasilo-17 that the point with the best predicted performance was better than the best point on the design for that step, while also being better than the best point found so far. Although we seem to be effectively restricting the search space with our exploration, which is evidenced by the improvement that occurs as steps progress and by the best points being found inside designs, the models fit using experiment data are not able to improve the current best on the majority of cases.

2.1.1.1.2.5 D-Optimality
2.1.1.1.2.5.1 D-Criterion

/tmp/babel-w5lAu8/figurey8nlJc.pdf

2.1.1.1.3 Fixing Binary Factors
2.1.1.1.3.1 Iterations that found the Best Configuration

Since OMP was quickly fixed at most iterations of the previous experiment, and had the largest impact on performance we could observe, we decided to fix it, along with all other binary parameters.

In the experiments shown on this section, all binary parameters are turned on by default, including on the random sampling experiments. Figure fig:nobin-best-found shows the iterations where the best configuration of each of the 10 runs was found, for uniform sampling and for our approach. We see that random sampling found an extremely fast point right at the second tested configuration.

library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
                                                 "lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
                                                        "lu", "mvt", "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(subset(it_data, technique == "DLMT")$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 2, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
    scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_size_manual("", values = 0.45) +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 17) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 17) +
                theme(text = element_text(family = "serif"),
                      legend.position = "bottom",
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
2.1.1.1.3.2 Eliminated Factors and Fixed Values

Even after fixing binary parameters, or maybe because of it, some parameters were still identified as significant and fixed. The T1_I parameter was only the fixed at two iterations on the previous experiment, but it was the most fixed parameter in this one, surpassing RT_I, which was frequently eliminated in both experiments. This suggests that fixing binary parameters allowed the significance of the other parameters to be detected.

base_size <- 18

ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(palette = "Set1")
2.1.1.1.3.3 Fixed Factors & Values in the Explored Spaces

We see that parasilo-13 and parasilo-23 did not eliminate any parameters after 4 steps. We also see that, at parasilo-12, parasilo-20, and parasilo-25, for example, the search space seems to be restricted to a region with worse performance than in the previous steps of the same experiment. This could be due to the fact that these experiments fixed the RT_I parameter to its fifth level. A slight improvement of the configurations tested can be seen at the other experiments, where other factors were fixed.

Identifying when a restriction of the search space to a worse region happens would be useful if we could revert or re-do an optimization step. This is an interesting motivation for keeping a human on the loop of the optimization process.

/tmp/babel-elHoq9/figure8UQjkW.png

/tmp/babel-Bco3bb/figuremYRSBQ.pdf

/tmp/babel-Bco3bb/figureaYZZxh.pdf

/tmp/babel-Bco3bb/figure98HYKr.pdf

/tmp/babel-Bco3bb/figurer5tVWU.pdf

2.1.1.1.3.4 D-Optimality
2.1.1.1.3.4.1 Performance at Each Step

/tmp/babel-w5lAu8/figurey8nlJc.pdf

2.1.1.1.3.4.2 D-Criterion

/tmp/babel-w5lAu8/figurey8nlJc.pdf

2.1.1.1.4 Reuse Data for aov and lm

Significant changes were performed on the initial DLMT implementation. We decided that there was no good reason to not reuse the data obtained from evaluating designs at each step, and the various samples of the search space taken at different points. Now, all evaluated experiments compose a single growing design used by aov to identify the best factors, and all samples from the search space compose a single data set used by optFederov to select experiments. The data set is pruned in both aov and lm analyses, to guarantee only experiments with the correct levels of fixed factors are used. This is crucial for both analyses, since having different fixed values of factors that are not in the model would imply that we would have inexplicable variance in the data set.

Using all experimental data on aov is interesting because it is always worth it to consider additional information on the factors that are currently being studied. On one hand, it might not allow for enough “flexibility” when we consider regression only on a small restricted subspace, because points outside the subspace would impact regression, and we would be interested in the significance of the factor inside the subspace at the moment. On the other hand, using all data available makes sense because we are also interested in exploring any global structures of the search space, and keeping points from other subspaces would increase the chance of “catching” the significance of a factor globally.

Using all sampled space, across all steps, as a candidate for optFederov has no downsides, provided we prune the search space to account for current constraints on factor levels fixed on previous steps. We increase the size of the available set of configurations that can compose a new design each time we sample a new subspace. This would hopefully improve the quality of designs produced as we progress.

2.1.1.1.4.1 Iterations that found the Best Configuration
library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
                                                 "lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
                                                        "lu", "mvt", "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 4, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    #geom_label_repel(data = . %>% group_by(application) %>%
    #                              filter(technique == "RS") %>%
    #                              filter(best_iteration == min(best_iteration)),
    #                 aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
    geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
    #scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
    scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_size_manual("", values = 0.9) +
    #annotation_logticks(sides = "b") +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 30) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          #text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    #scale_color_brewer(palette = "Set1")
    scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 30) +
                theme(text = element_text(family = "serif"),
                      legend.position = c(0.2, 0.5),
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}

mixed_plot_data = bind_rows(mixed_plot_data,
                            it_data %>%
                            mutate(technique = gsub(pattern = "DLMT",
                                                    replacement = "Reusing Designs",
                                                    x = technique, fixed = TRUE)) %>%
                            filter(technique != "RS"))

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
2.1.1.1.4.2 Eliminated Factors and Fixed Values
base_size <- 25

ggplot(subset(eliminated_terms, removed_variables != ""),
       aes(x = removed_variables,
           fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = c(0.8, 0.95),
        legend.background = element_blank(),
        legend.direction = "horizontal",
        axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(name = "Step", palette = "Set2")
2.1.1.1.4.3 Fixed Factors & Values in the Explored Spaces

./img/explored_space_reuse.pdf

/tmp/babel-X0G35y/figureSHJRg0.pdf

/tmp/babel-X0G35y/figureG7DSgI.pdf

./img/models_reuse.pdf

./img/models_fixed_omp_reuse.pdf

2.1.1.1.4.4 Predicted Performance at Each Step

./img/predictions_reuse.pdf

2.1.1.1.4.5 D-Optimality
2.1.1.1.4.5.1 D-Criterion

/tmp/babel-X0G35y/figurekA5QNA.pdf

2.1.1.1.5 Reuse Data for aov and lm: Binary set to False

This experiment uses an initial model with linear and quadratic terms for all factors. All binary parameters were fixed to its False value. We hoped to determine if the effects of binary factors, which are proportionally much larger than the effects of the other factors, were preventing the detection of the significance of the other factors.

2.1.1.1.5.1 Iterations that found the Best Configuration
library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
                                                 "lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
                                                        "lu", "mvt", "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 2, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    #geom_label_repel(data = . %>% group_by(application) %>%
    #                              filter(technique == "RS") %>%
    #                              filter(best_iteration == min(best_iteration)),
    #                 aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
    geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
    #scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
    scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_size_manual("", values = 0.45) +
    #annotation_logticks(sides = "b") +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 17) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    #scale_color_brewer(palette = "Set1")
    scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 17) +
                theme(text = element_text(family = "serif"),
                      legend.position = "bottom",
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
2.1.1.1.5.2 Eliminated Factors and Fixed Values
base_size <- 18

ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(palette = "Set1")
2.1.1.1.5.3 Fixed Factors & Values in the Explored Spaces

/tmp/babel-Bco3bb/figure2m33md.pdf

/tmp/babel-Bco3bb/figureLgLJS8.pdf

/tmp/babel-Bco3bb/figureAw53CA.pdf

/tmp/babel-Bco3bb/figureDkdM8P.pdf

2.1.1.1.5.4 Predicted Performance at Each Step

/tmp/babel-Bco3bb/figure21sY97.pdf

2.1.1.1.5.5 D-Optimality
2.1.1.1.5.5.1 D-Criterion

/tmp/babel-Bco3bb/figureqMGCjQ.pdf

2.1.1.1.6 Updated DLMT with Quantile Regression ($τ = 0.05$, Binary Parameters)
2.1.1.1.6.1 Iterations that found the Best Configuration
library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
                                                 "lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
                                                        "lu", "mvt", "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 2, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    #geom_label_repel(data = . %>% group_by(application) %>%
    #                              filter(technique == "RS") %>%
    #                              filter(best_iteration == min(best_iteration)),
    #                 aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
    geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
    #scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
    scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_size_manual("", values = 0.45) +
    #annotation_logticks(sides = "b") +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 17) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    #scale_color_brewer(palette = "Set1")
    scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 17) +
                theme(text = element_text(family = "serif"),
                      legend.position = "bottom",
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}

mixed_plot_data = bind_rows(mixed_plot_data,
                            it_data %>%
                            mutate(technique = gsub(pattern = "DLMT",
                                                    replacement = "Quantile Regression (tau = 0.05)",
                                                    x = technique, fixed = TRUE)) %>%
                            filter(technique != "RS"))

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
2.1.1.1.6.2 Eliminated Factors and Fixed Values
base_size <- 18

ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(palette = "Set1")
2.1.1.1.6.3 Fixed Factors & Values in the Explored Spaces

/tmp/babel-Bco3bb/figureXncsjg.pdf

/tmp/babel-Bco3bb/figuresjumNZ.pdf

/tmp/babel-Bco3bb/figureeG8fKl.pdf

/tmp/babel-Bco3bb/figureDkdM8P.pdf

2.1.1.1.6.4 Predicted Performance at Each Step

/tmp/babel-Bco3bb/figureWM6DrI.pdf

2.1.1.1.6.5 D-Optimality
2.1.1.1.6.5.1 D-Criterion

/tmp/babel-Bco3bb/figureqMGCjQ.pdf

2.1.1.1.7 Running more Steps on the Original Experiment
2.1.1.1.7.1 Iterations that found the Best Configuration

Figure fig:nobin-best-found shows the iterations where the best configuration of each of the 10 runs was found, for uniform sampling and for our approach.

library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
                                                 "lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
                                                        "lu", "mvt", "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(subset(it_data, technique == "DLMT")$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 2, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
    scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_size_manual("", values = 0.45) +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 17) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 17) +
                theme(text = element_text(family = "serif"),
                      legend.position = "bottom",
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}

mixed_plot_data = bind_rows(mixed_plot_data,
                            it_data %>%
                            mutate(technique = gsub(pattern = "DLMT",
                                                    replacement = "Running 8 Steps",
                                                    x = technique, fixed = TRUE)) %>%
                            filter(technique != "RS"))

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
2.1.1.1.7.2 Eliminated Factors and Fixed Values
base_size <- 18

ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(palette = "Set1")
2.1.1.1.7.3 Fixed Factors & Values in the Explored Spaces

/tmp/babel-elHoq9/figure8UQjkW.png

2.1.1.1.7.4 D-Optimality
2.1.1.1.7.4.1 Performance at Each Step

/tmp/babel-w5lAu8/figurey8nlJc.pdf

2.1.1.1.7.4.2 D-Criterion

/tmp/babel-w5lAu8/figurey8nlJc.pdf

2.1.1.1.8 Running aov, lm, with -march=native
2.1.1.1.8.1 Iterations that found the Best Configuration
library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor",
                                                 "gesummv", "dgemv3", "lu", "mvt", "seidel",
                                                 "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor",
                                                        "gesummv", "dgemv3", "lu", "mvt",
                                                        "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm",
                             "tensor", "gesummv", "dgemv3", "lu", "mvt",
                             "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "dgemv", "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 4, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    #geom_label_repel(data = . %>% group_by(application) %>%
    #                              filter(technique == "RS") %>%
    #                              filter(best_iteration == min(best_iteration)),
    #                 aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
    geom_vline(aes(xintercept = mean_cost_baseline,
                   size = "-O3"),
               linetype = 8,
               stat = "unique",
               color = "black") +
    #scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
    scale_y_continuous(limits = c(-20, 400),
                       breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_x_continuous(limits = c(0, 2.5),
                       breaks = seq(0, 2.5, length.out = 5)) +
    scale_size_manual("", values = 0.9) +
    #annotation_logticks(sides = "b") +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 30) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          #text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    scale_color_brewer(palette = "Set1")
    #scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 30) +
                theme(text = element_text(family = "serif"),
                      legend.position = c(0.2, 0.5),
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}

# mixed_plot_data = bind_rows(mixed_plot_data,
#                             it_data %>%
#                             mutate(technique = gsub(pattern = "DLMT",
#                                                     replacement = "Reusing Designs",
#                                                     x = technique, fixed = TRUE)) %>%
#                             filter(technique != "RS"))

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)
2.1.1.1.8.2 Eliminated Factors and Fixed Values
base_size <- 25

ggplot(subset(eliminated_terms, removed_variables != ""),
       aes(x = removed_variables,
           fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = c(0.8, 0.95),
        legend.background = element_blank(),
        legend.direction = "horizontal",
        axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(name = "Step", palette = "Set2")
2.1.1.1.8.3 Fixed Factors & Values in the Explored Spaces

./img/explored_space_reuse.pdf

/tmp/babel-X0G35y/figureSHJRg0.pdf

/tmp/babel-X0G35y/figureG7DSgI.pdf

./img/models_reuse.pdf

./img/models_fixed_omp_reuse.pdf

2.1.1.1.8.4 Predicted Performance at Each Step

./img/predictions_reuse.pdf

2.1.1.1.8.5 D-Optimality
2.1.1.1.8.5.1 D-Criterion

/tmp/babel-X0G35y/figurekA5QNA.pdf

2.1.1.1.9 Plotting Mixed Experiments
library(ggplot2)
library(dplyr)
library(latex2exp)
library(patchwork)
library(paletteer)

plot_df =  read.csv("data/complete_bicgkernel_data.csv") %>%
    mutate(technique = factor(technique,
                              levels = c("RS",
                                         "Original",
                                         "No Cubic Terms",
                                         "Reusing Designs",
                                         "Running 8 Steps",
                                         "Quantile Regression (tau = 0.05)"),
                              labels = c("RS",
                                         "Original",
                                         "No Cubic",
                                         "Reuse Data",
                                         "Extra Steps",
                                         "QR (5%)")))

p1 = ggplot() +
    geom_jitter(data = plot_df,
                aes(y = min_run_cost,
                    x = technique,
                    color = best_iteration),
                stat = "unique",
                width = 0.2,
                height = 0.0,
                size = 4) +
    geom_hline(data = plot_df,
               aes(yintercept = min(mean_cost_baseline)),
               stat = "unique",
               size = 1.1,
               linetype = 2,
               color = "black") +
    geom_text(data = plot_df,
              aes(x = 0.5,
                  y = min(mean_cost_baseline) - 0.14),
              size = 10,
              stat = "unique",
              label = "Baseline (-O3)",
              hjust = 0,
              color = "black") +
    ylab("Execution Time (s)") +
    scale_color_paletteer_c(name = "Iteration Found",
                            n.breaks = 6,
                            direction = -1,
                            palette = "oompaBase::greenscale") +
    theme_bw(base_size = 33) +
    theme(legend.position = c(0.7, 0.86),
          legend.direction = "horizontal",
          legend.title = element_text(size = 25),
          legend.key.width = unit(2, "cm"),
          axis.title.x = element_blank(),
          legend.background = element_blank())

p2 = ggplot() +
    geom_jitter(data = plot_df,
                aes(y = best_iteration,
                    x = technique,
                    color = min_run_cost),
                stat = "unique",
                width = 0.2,
                height = 0.0,
                size = 4) +
    ylab("Iteration Finding the Best Point") +
    scale_color_paletteer_c(name = "Execution Time",
                            n.breaks = 6,
                            direction = -1,
                            palette = "oompaBase::greenscale") +
    theme_bw(base_size = 33) +
    theme(legend.position = c(0.7, 0.9),
          legend.direction = "horizontal",
          legend.title = element_text(size = 25),
          legend.key.width = unit(2, "cm"),
          axis.title.x = element_blank(),
          legend.background = element_blank())

p1 / p2
2.1.1.1.9.0.1 Iterations that found the Best Configuration
library(grid)
library(gtable)
library(ggrepel)
library(utf8)

it_data <- complete_plot_data

it_data <- it_data %>% subset(application %in% c("bicgkernel", "mm", "tensor", "gesummv",
                                                 "lu", "mvt", "seidel", "jacobi"))
it_data$facet <- factor(it_data$application, levels = c("bicgkernel", "mm", "tensor", "gesummv",
                                                        "lu", "mvt", "seidel", "jacobi"))

it_data$header <- rep("0", nrow(it_data))

it_data[it_data$facet %in% c("bicgkernel", "mm", "tensor", "gesummv", "lu", "mvt", "seidel", "jacobi"), "header"] <- "C"

it_data$header <- factor(it_data$header, levels = c("C"))

levels(it_data$facet) <- c("[+] bicgkernel", "[+] mm", "[+] tensor", "[+] gesummv",
                           "[+] lu", "[+] mvt", "[+] seidel", "[+] jacobi")

it_data$mean_cost_baseline <- unique(it_data$mean_cost_baseline)[1]

p1 <- ggplot(it_data, aes(min_run_cost, best_iteration, color = technique)) +
    facet_wrap(facet ~ ., ncol = 4) +
    geom_point(size = 2, pch = 19) +
    stat_ellipse(type = "t", linetype = 13) +
    #geom_label_repel(data = . %>% group_by(application) %>%
    #                              filter(technique == "RS") %>%
    #                              filter(best_iteration == min(best_iteration)),
    #                 aes(label = technique, x = label_center_x, y = label_center_y), show.legend = FALSE) +
    geom_vline(aes(xintercept = mean_cost_baseline, size = "-O3"), linetype = 8, color = "black") +
    #scale_x_log10(labels = scales::trans_format("log10", scales::math_format(10^.x))) +
    scale_y_continuous(limits = c(-20, 400), breaks = c(0, 50, 100, 150, 200, 250, 300, 350, 400)) +
    scale_size_manual("", values = 0.45) +
    #annotation_logticks(sides = "b") +
    ggtitle("") +
    ylab("Iteration where Best was Found") +
    xlab("Best Cost in Seconds") +
    guides(color = guide_legend(reverse = TRUE)) +
    theme_bw(base_size = 17) +
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          legend.title = element_blank(),
          text = element_text(family = "serif"),
          strip.background = element_rect(fill = "white"),
          plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm"))  +
    #scale_color_brewer(palette = "Set1")
    scale_color_grey(start = 0.3, end = 0.7)

dummy <- ggplot(data = it_data, aes(x = min_run_cost, y = best_iteration)) +
                facet_wrap(facet ~ ., scale = "free", ncol = 4) +
                geom_rect(aes(fill = header), xmin = -Inf, xmax = Inf,
                                              ymin = -Inf, ymax = Inf) +
                theme_minimal(base_size = 17) +
                theme(text = element_text(family = "serif"),
                      legend.position = "bottom",
                      legend.direction = "horizontal",
                      legend.title = element_blank(),
                      plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")
                      )  +
                scale_fill_brewer(palette = "Pastel2", direction = -1)
                #scale_fill_grey()

g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(dummy)

gtable_select <- function (x, ...)
{
  matches <- c(...)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  x
}

panels <- grepl(pattern = "panel", g2$layout$name)
strips <- grepl(pattern = "strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)
#grid.newpage()
grid.draw(new_strips)

gtable_stack <- function(g1, g2){
  g1$grobs <- c(g1$grobs, g2$grobs)
  g1$layout <- transform(g1$layout, z = z - max(z), name = "g2")
  g1$layout <- rbind(g1$layout, g2$layout)
  g1
}

new_plot <- gtable_stack(g1, new_strips)
#grid.newpage()
grid.draw(new_plot)

The results are very similar from the ones in the paper, but our approach found slightly better configurations slightly faster. Random sampling also found a much better configuration much faster than before in one experiment.

2.1.1.1.9.0.2 Eliminated Factors and Fixed Values Figure fig:eliminated-terms shows the count of model terms that were eliminated at each color-coded DLMT step. As before, we see that OMP and SCR were the most eliminated factors, especially on the first step.
base_size <- 18

ggplot(subset(eliminated_terms, removed_variables != ""), aes(x = removed_variables, fill = as.factor(step))) +
  geom_bar(stat = "Count", show.legend = TRUE) +
  ylab("Eliminated Terms") +
  xlab("") +
  scale_x_discrete() +
  theme_bw(base_size = base_size) +
  theme(text = element_text(family = "sans"),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  guides(fill = guide_legend(nrow = 1)) +
  scale_fill_brewer(palette = "Set1")
2.1.1.1.9.0.3 Fixed Factors & Values in the Explored Spaces

/tmp/babel-Bco3bb/figureH9Y6En.pdf

/tmp/babel-Bco3bb/figureCf51rF.pdf

/tmp/babel-elHoq9/figure3xnWMZ.pdf

/tmp/babel-Bco3bb/figureqNvfdJ.pdf

/tmp/babel-Bco3bb/figureczY2cs.pdf

/tmp/babel-Bco3bb/figureva5tXz.pdf

2.1.1.1.9.0.4 Predicted Performance at Each Step Figure fig:nc-predicted-best compares the performance predicted by the fitted model, represented by the green line, with the performance of the best point found at each design, represented by the blue line. The red line marks the best point found so far.

/tmp/babel-w5lAu8/figurey8nlJc.pdf

It is only at the fourth step of parasilo-17 that the point with the best predicted performance was better than the best point on the design for that step, while also being better than the best point found so far. Although we seem to be effectively restricting the search space with our exploration, which is evidenced by the improvement that occurs as steps progress and by the best points being found inside designs, the models fit using experiment data are not able to improve the current best on the majority of cases.

2.1.1.1.9.0.5 D-Optimality2.1.1.1.9.0.5.1 D-Criterion

/tmp/babel-w5lAu8/figurey8nlJc.pdf

2.2 November

2.2.1 [2019-11-25 Mon]

2.2.1.1 Gaussian Process Regression on SPAPT kernels

2.2.1.1.1 Results for bicgkernel
2.2.1.1.1.1 Figures

./img/bicgkernel_gaussian_pareto.pdf

./img/bicgkernel_rs_pareto.pdf

2.2.1.1.1.2 Figures

/tmp/babel-TX7U1J/figurek1JPiJ.pdf

/tmp/babel-TX7U1J/figure5ifJok.pdf

/tmp/babel-TX7U1J/figureMcsVvv.pdf

2.2.1.1.1.3 Figures
optimisation start
------------------
* estimation method   : MLE
* optimisation method : BFGS
* analytical gradient : used
* trend model : ~1
* covariance model :
  - type :  matern5_2
  - nugget : NO
  - parameters lower bounds :  1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10 1e-10
  - parameters upper bounds :  4094 4094 62 62 2 62 4094 2 4094 4094 62 2 4094 4094 4094 4094 2 2 2 4094 2 4094 2 4094 2 30 30 4094 2 30 30 30 30 30 30 30 30 30 62 62 4094 4094 30 4094 30 62 62 30
  - best initial criterion value(s) :  -712.9143

N = 48, M = 5 machine precision = 2.22045e-16
At X0, 0 variables are exactly at the bounds
At iterate     0  f=       712.91  |proj g|=      0.15272
At iterate     1  f =       712.88  |proj g|=       0.15294
ys=-1.679e-05  -gs= 3.601e-02, BFGS update SKIPPED
At iterate     2  f =       711.11  |proj g|=       0.16401
At iterate     3  f =       708.01  |proj g|=       0.18559
ys=-1.555e-01  -gs= 3.027e+00, BFGS update SKIPPED
At iterate     4  f =       703.39  |proj g|=       0.22168
ys=-4.679e-01  -gs= 4.374e+00, BFGS update SKIPPED
At iterate     5  f =       698.95  |proj g|=       0.25502
ys=-3.892e-01  -gs= 4.231e+00, BFGS update SKIPPED
At iterate     6  f =       690.43  |proj g|=        0.2156
At iterate     7  f =       688.99  |proj g|=        0.1577
At iterate     8  f =          688  |proj g|=      0.010538
At iterate     9  f =       687.94  |proj g|=      0.013195
At iterate    10  f =       687.66  |proj g|=       0.04427
At iterate    11  f =       687.49  |proj g|=      0.023682
At iterate    12  f =       687.24  |proj g|=      0.010114
At iterate    13  f =       687.22  |proj g|=      0.010098
At iterate    14  f =       687.17  |proj g|=      0.010069
At iterate    15  f =       686.88  |proj g|=       0.01359
At iterate    16  f =       686.58  |proj g|=     0.0094514
At iterate    17  f =       686.45  |proj g|=       0.02374
At iterate    18  f =       686.28  |proj g|=       0.03131
At iterate    19  f =       686.14  |proj g|=      0.015667
At iterate    20  f =       685.98  |proj g|=     0.0087571
At iterate    21  f =       685.91  |proj g|=     0.0086694
At iterate    22  f =       685.87  |proj g|=     0.0086355
At iterate    23  f =       685.25  |proj g|=      0.028855
At iterate    24  f =        684.9  |proj g|=      0.015802
At iterate    25  f =       684.16  |proj g|=      0.006761
At iterate    26  f =       682.51  |proj g|=      0.012195
At iterate    27  f =       682.18  |proj g|=     0.0074427
At iterate    28  f =       682.16  |proj g|=     0.0048397
At iterate    29  f =       681.51  |proj g|=      0.046568
At iterate    30  f =       681.47  |proj g|=       0.01972
At iterate    31  f =       681.46  |proj g|=     0.0042831
At iterate    32  f =       681.45  |proj g|=     0.0042837
At iterate    33  f =       681.45  |proj g|=     0.0042772
At iterate    34  f =       681.44  |proj g|=      0.016674
At iterate    35  f =       680.36  |proj g|=       0.03849
At iterate    36  f =       677.18  |proj g|=     0.0067063
At iterate    37  f =          677  |proj g|=     0.0030387
At iterate    38  f =       676.96  |proj g|=     0.0038301
At iterate    39  f =       676.81  |proj g|=     0.0046546
At iterate    40  f =       676.64  |proj g|=      0.063529
At iterate    41  f =       676.54  |proj g|=     0.0050253
At iterate    42  f =       676.39  |proj g|=      0.016501
At iterate    43  f =       675.15  |proj g|=        0.1002
At iterate    44  f =       674.62  |proj g|=      0.063694
At iterate    45  f =       674.47  |proj g|=      0.013433
At iterate    46  f =       674.38  |proj g|=      0.016088
At iterate    47  f =       674.23  |proj g|=      0.045203
At iterate    48  f =       673.83  |proj g|=      0.081005
At iterate    49  f =       672.99  |proj g|=       0.10431
At iterate    50  f =       672.61  |proj g|=      0.053047
At iterate    51  f =       672.01  |proj g|=      0.022967
At iterate    52  f =       671.77  |proj g|=      0.042637
At iterate    53  f =       671.06  |proj g|=      0.071004
At iterate    54  f =       670.98  |proj g|=      0.021088
At iterate    55  f =       670.97  |proj g|=    0.00061831
At iterate    56  f =       670.97  |proj g|=    0.00083355
At iterate    57  f =       670.97  |proj g|=     0.0025396
At iterate    58  f =       670.97  |proj g|=     0.0051888
At iterate    59  f =       670.96  |proj g|=     0.0094556
At iterate    60  f =       670.96  |proj g|=      0.015551
At iterate    61  f =       670.95  |proj g|=      0.023173
At iterate    62  f =       670.93  |proj g|=      0.028516
At iterate    63  f =       670.89  |proj g|=      0.021213
At iterate    64  f =       670.87  |proj g|=      0.014479
At iterate    65  f =       670.83  |proj g|=     0.0023803
At iterate    66  f =       670.77  |proj g|=      0.018208
At iterate    67  f =       670.66  |proj g|=      0.032607
At iterate    68  f =       670.64  |proj g|=     0.0050425
At iterate    69  f =        670.4  |proj g|=     0.0040009
At iterate    70  f =       670.17  |proj g|=     0.0085998
At iterate    71  f =       669.94  |proj g|=     0.0020373
At iterate    72  f =       669.93  |proj g|=     0.0073922
At iterate    73  f =        669.9  |proj g|=      0.021443
At iterate    74  f =       669.81  |proj g|=      0.048122
At iterate    75  f =       669.61  |proj g|=      0.079202
At iterate    76  f =       669.41  |proj g|=      0.080633
At iterate    77  f =       669.22  |proj g|=      0.028117
At iterate    78  f =       669.18  |proj g|=     0.0045062
At iterate    79  f =       669.18  |proj g|=     0.0010684
At iterate    80  f =       669.18  |proj g|=     0.0019219
At iterate    81  f =       669.18  |proj g|=     0.0021834
At iterate    82  f =       669.17  |proj g|=    0.00082061
At iterate    83  f =       669.16  |proj g|=     0.0052274
At iterate    84  f =       669.14  |proj g|=     0.0096021
At iterate    85  f =        669.1  |proj g|=     0.0028422
At iterate    86  f =       669.06  |proj g|=      0.033474
At iterate    87  f =       668.99  |proj g|=      0.026382
At iterate    88  f =       668.66  |proj g|=      0.010887
At iterate    89  f =       668.62  |proj g|=     0.0090599
At iterate    90  f =       668.57  |proj g|=     0.0061299
At iterate    91  f =       668.26  |proj g|=      0.014394
At iterate    92  f =       667.89  |proj g|=      0.028062
At iterate    93  f =       667.26  |proj g|=      0.063625
At iterate    94  f =       666.72  |proj g|=      0.027098
At iterate    95  f =       666.13  |proj g|=      0.011637
At iterate    96  f =       665.98  |proj g|=     0.0075273
At iterate    97  f =       665.86  |proj g|=      0.011719
At iterate    98  f =       665.54  |proj g|=      0.023306
At iterate    99  f =       665.46  |proj g|=         0.026
At iterate   100  f =       665.33  |proj g|=      0.012704
At iterate   101  f =       664.82  |proj g|=       0.01516
final  value 664.821889
stopped after 101 iterations
library(ggplot2)
df <- data.frame(mean = x$S$mean[1, ],
                 var = x$S$var[1, ],
                 varPG = x$S$varPG[1, ],
                 ci = sqrt(x$S$var[1, ]) * 1.96,
                 ci_inf = x$S$ci[1, ],
                 ci_sup = x$S$ci[2, ]
                 )

df <- df %>%
  dplyr::mutate(id = row_number())

ggplot(df) +
  geom_point(aes(x = id, y = mean), size = 4) +
  geom_errorbar(aes(x = id, ymin = mean - ci, ymax = mean + ci)) +
  #geom_errorbar(aes(x = id, ymin = mean - varPG, ymax = mean + varPG)) +
  theme_bw(base_size = 28)

/tmp/babel-GhpmaA/figurewW2gDS.pdf

/tmp/babel-VnHilA/figureqnxBTV.pdf

/tmp/babel-XCC5io/figurenXziDX.pdf

./img/gp_rs_dgemv_comparison.pdf

v1v2
v11.00.0
v20.01.0
library(latex2exp)
library(MASS)

n <- 2
sigma <- data.frame(d1 = c(1, 0), d2 = c(0, 1))
means <- c(0, 0)

mv_sample <- mvrnorm(n = 300, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)

library(ggplot2)
library(GGally)

ggpairs(mv_sample) +
  theme_bw(base_size = 28)

To get strong correlations, the covariance matrix could be:

v1v2
v11.00.8
v20.81.0
library(MASS)

n <- 2
sigma <- data.frame(d1 = c(1, 0.8), d2 = c(0.8, 1))
means <- c(0, 0)

mv_sample <- mvrnorm(n = 600, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)

names(mv_sample) <- paste("V", seq(1, n), sep = "")
library(ggplot2)
library(GGally)

ggpairs(mv_sample) +
  theme_bw(base_size = 26)
2.2.1.1.1.4 10-Dimensional Example: Reinterpreting Samples
library(latex2exp)
library(MASS)

n <- 10
sigma <- data.frame(diag(10))
names(sigma) <- paste("d", seq(1, n), sep = "")

means <- rep(0, n)

mv_sample <- mvrnorm(n = 300, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)

names(mv_sample) <- paste("d", seq(1, n), sep = "")
library(ggplot2)
library(GGally)

ggpairs(mv_sample) +
  theme_bw(base_size = 22)
library(latex2exp)
library(MASS)

n <- 10
sigma <- data.frame(diag(10))
names(sigma) <- paste("d", seq(1, n), sep = "")

means <- rep(0, n)

mv_sample <- mvrnorm(n = 1000, means, as.matrix(sigma))
mv_sample <- as.data.frame(mv_sample)

names(mv_sample) <- paste("d", seq(1, n), sep = "")
library(dplyr)
library(tidyr)
library(ggplot2)
library(latex2exp)

plot_data <- mv_sample
sampled_function <- sample_n(plot_data, 1)

plot_data <- plot_data %>%
  gather("x", "f_x") %>%
  mutate(x = ordered(x, levels = names(mv_sample)))

sampled_function <- sampled_function %>%
  gather("x", "f_x") %>%
  mutate(x = ordered(x, levels = names(mv_sample)))

ggplot(plot_data, aes(x = x, y = f_x)) +
  geom_jitter(color = "gray48", size = 3, width = 0.25, alpha = 0.2) +
  geom_point(data = sampled_function,
             aes(color = "Sample of Multivariate Normal"),
             size = 4) +
  geom_line(data = sampled_function,
            color = "red",
            size = 1,
            alpha = 0.3) +
  ylab(TeX("Sampled Values")) +
  xlab(TeX("Dimensions")) +
  scale_fill_manual("", values = "gray48") +
  scale_color_brewer(palette = "Set1") +
  theme_bw(base_size = 26) +
  theme(legend.title = element_blank(),
        legend.background = element_rect(fill = "transparent"),
        legend.position = c(0.24, 0.06))
library(dplyr)
library(tidyr)
library(ggplot2)
library(latex2exp)

n <- 10

plot_data <- mv_sample
names(plot_data) <- seq(1, n)

sampled_function <- sample_n(plot_data, 1)

plot_data <- plot_data %>%
  gather("x", "f_x") %>%
  mutate(x = as.numeric(x))

sampled_function <- sampled_function %>%
  gather("x", "f_x") %>%
  mutate(x = as.numeric(x))

ggplot(plot_data, aes(x = x, y = f_x)) +
  geom_jitter(color = "gray48", size = 3, width = 0.25, alpha = 0.2) +
  geom_point(data = sampled_function,
             aes(color = "Sampled Function"),
             size = 4) +
  geom_line(data = sampled_function,
            color = "red",
            size = 1,
            alpha = 0.3) +
  ylab(TeX("Samples of $(d_1,\\ldots,d_{10})$ interpreted as $f(x \\in \\lbrack 1,10 \\rbrack)$")) +
  xlab(TeX("$(d_1,\\ldots,d_{10})$ interpreted as discrete $x \\in \\lbrack 1,10 \\rbrack$")) +
  scale_x_discrete(limits = seq(1, 10)) +
  scale_fill_manual("", values = "gray48") +
  scale_color_brewer(palette = "Set1") +
  theme_bw(base_size = 26) +
  theme(legend.title = element_blank(),
        legend.background = element_rect(fill = "transparent"),
        legend.position = c(0.2, 0.06))
2.2.1.1.2 Sampling Functions
d <- 2
n <- 30

target_X <- expand.grid(x1 = seq(-1, 1, length = n), x2 = seq(-1, 1, length = n))
library(ggplot2)
library(GGally)

ggpairs(target_X) +
  theme_bw(base_size = 26)
library(Rcpp)
library(dplyr)
library(MASS)

src <-
"#include <Rcpp.h>
#include <math.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix rcpp_squared_exponential_kernel(NumericMatrix x,
                                              NumericMatrix y,
                                              float amplitude,
                                              NumericVector lengthscale){
  NumericMatrix output(x.nrow(), y.nrow());
  float distance;

  for(int i = 0; i < x.nrow(); i++) {
    for(int j = 0; j < y.nrow(); j++) {
      distance = 0;
      for(int k = 0; k < x.ncol(); k++) {
        distance += (((x(i, k) - y(j, k)) *
                      (x(i, k) - y(j, k))) /
                      (lengthscale(k) * lengthscale(k)));
      }

      output(i, j) = amplitude *
                     amplitude *
                     exp(-0.5 * distance);
    }
  }
  return(output);
}"

sourceCpp(code = src)

random_design <- function(factors, size) {
  data.frame(replicate(factors, runif(size)))
}

simulate_gp <- function(covariance_matrix) {
  return(unname(mvrnorm(n = 1,
                        rep(0.0, nrow(covariance_matrix)),
                        covariance_matrix)))
}

sqexp_covariance_matrix <- function(data, amplitude, lengthscale) {
  return(rcpp_squared_exponential_kernel(as.matrix(data),
                                         as.matrix(data),
                                         amplitude,
                                         lengthscale))
}

significance_probability <- 0.10
amplitude <- 1.0
lengthscale <- rep(0.2, n)

cov_matrix <- sqexp_covariance_matrix(target_X, amplitude, lengthscale)
plot_data <- target_X
library(DiceKriging)
library(dplyr)
library(RColorBrewer)
library(lattice)

se_kernel <- function(x, x_p, l = 0.2) {
  return(exp(-(
    ((x - x_p) ^ 2) /
    ((2 * l) ^ 2)
  )))
}

d <- 2
n <- 32

point_grid <- data.frame(x1 = seq(-1, 1, length = n), x2 = rep(0, n))
evaluated_grid <- data.frame(point_grid)

evaluated_grid$y <- se_kernel(point_grid$x1, point_grid$x2)

colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)

ggplot(evaluated_grid, aes(x = x1, y = y)) +
  geom_point() +
  theme_bw(base_size = 18)
library(lattice)
library(RColorBrewer)

plot_data$y <- simulate_gp(cov_matrix)

colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)

wireframe(y ~ x1 * x2,
          data = plot_data,
          xlab = "x1",
          ylab = "x2",
          zlab = list("k(x,x')",
                      rot = "90"),
          #zlim = range(seq(0.0, 1.0, by = 0.5)),
          colorkey = FALSE,
          col.regions = colors,
          drape = TRUE,
          #shade = TRUE,
          lattice.options = lattice.options(list(border = FALSE)),
          scales = list(arrows = FALSE),
          screen = list(z = 20, x = -60, y = 0),
          par.settings = list(axis.line = list(col = "transparent")))
2.2.1.1.3 Regression, Sampling, EI
library(ggplot2)
library(dplyr)
library(DiceKriging)
library(DiceOptim)

gpr_data <- data.frame(x = c(0.1, 0.25, 0.3, 0.67, 0.8),
                       y = c(0.1, 0.1, 0.6, -0.02, 0.1))

# gpr_data <- data.frame(x = c(0.45, 0.5, 0.55),
#                        y = c(-0.1, 0.08, 0.2))

x_allowed <- data.frame(x = seq(0, 1, length = 300))

# reg <- km(formula = ~ I(x^2), design = dplyr::select(gpr_data, -y),
#           control = list(pop.size = 40,
#                          BFGSburnin = 4),
#           response = gpr_data$y)

reg <- km(design = dplyr::select(gpr_data, -y),
          control = list(pop.size = 400,
                         BFGSburnin = 400),
          response = gpr_data$y)

pred <- predict(reg, x_allowed, "UK")

x_allowed$y <- pred$mean
x_allowed$ymin <- pred$mean - (2 * pred$sd)
x_allowed$ymax <- pred$mean + (2 * pred$sd)
x_allowed$ei <- apply(dplyr::select(x_allowed, x), 1, EI, reg)
x_allowed$sampled_y <- simulate(reg, cond = TRUE, newdata = dplyr::select(x_allowed, x))[1, ]
x_allowed$uncond_y <- simulate(reg, cond = FALSE, newdata = dplyr::select(x_allowed, x))[1, ]

ggplot(data = gpr_data, aes(x = x, y = y)) +
  geom_ribbon(data = x_allowed, aes(x = x, ymin = ymin, ymax = ymax, fill = "CI of the Mean"), alpha = 0.3) +
  geom_line(data = x_allowed, size = 1, aes(color = "Predicted Mean")) +
  #geom_line(data = x_allowed, size = 1, aes(x = x, y = sampled_y, color = "Conditioned Sample")) +
  #geom_line(data = x_allowed, size = 1, aes(x = x, y = uncond_y, color = "Prior Sample")) +
  geom_line(data = x_allowed, size = 1, aes(x = x, y = ei, color = "Expected Improvement")) +
  geom_point(stroke = 2, shape = 3, size = 3, aes(color = "Observed")) +
  geom_point(data = subset(x_allowed, ei == max(ei)), size = 4, stroke = 2, shape = 3, aes(x = x, y = ei, color = "Maximum EI")) +
  scale_color_brewer(palette = "Set1") +
  scale_fill_manual("", values = "gray48") +
  theme_bw(base_size = 26) +
  theme(legend.title = element_blank(),
        legend.background = element_rect(fill = "transparent"),
        legend.position = c(0.2, 0.2))

2.2.1.2 Plotting Covariance Kernels

2.2.1.2.1 Constant
library(DiceKriging)
library(dplyr)
library(lattice)

constant_kernel <- function(x) {
  return(1.0)
}

d <- 2
n <- 16

point_grid <- expand.grid(x1 = seq(0, 1, length = n), x2 = seq(0, 1, length = n))
evaluated_grid <- data.frame(point_grid)

evaluated_grid$y <- apply(point_grid, 1, constant_kernel)

str(evaluated_grid)

colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)

wireframe(y ~ x1 * x2,
          data = evaluated_grid,
          xlab = "x1",
          ylab = "x2",
          zlab = list("k(x,x')",
                      rot = "90"),
          zlim = range(seq(0.0, 2.0, by = 0.5)),
          colorkey = FALSE,
          col.regions = colors,
          drape = TRUE,
          #shade = TRUE,
          lattice.options = lattice.options(list(border = FALSE)),
          scales = list(arrows = FALSE),
          screen = list(z = 140, x = -60, y = 0),
          par.settings = list(axis.line = list(col = "transparent")))

2.2.1.2.2 Analyzing Data
results_data <- select(complete_plot_data,
                       OMP, SCR, VEC1, VEC2,
                       RT_I, RT_J, U1_I, T1_I,
                       T1_J, T2_I, T2_J, U_I, U_J,
                       cost_mean)
summary(aov(cost_mean ~ .*., results_data))

nobin_results_data <- results_data %>%
                      filter(OMP == "True", SCR == "True",
                             VEC1 == "True", VEC2 == "True") %>%
                      select(-OMP, -VEC1,
                             -VEC2, -SCR)

summary(aov(cost_mean ~ .*., nobin_results_data))

2.2.1.2.3 Estimating GPR Kernel Parameters
results_data <- select(complete_plot_data,
                       OMP, SCR, VEC1, VEC2,
                       RT_I, RT_J, U1_I, T1_I,
                       T1_J, T2_I, T2_J, U_I, U_J,
                       cost_mean)
library(DiceKriging)
library(dplyr)
library(rsm)

sampled_results_data <- sample_n(results_data, 5000)

sampled_results_data <- sampled_results_data %>% mutate_all(funs(str_replace(., "True", "1"))) %>%
  mutate_all(funs(str_replace(., "False", "0"))) %>%
  mutate_all(funs(as.integer(.)))

coded_sampled_design <- coded.data(select(sampled_results_data, -cost_mean),
                                   formulas = list(T1_Ie = T1_Ie ~ (T1_I - 5.5) / 5.5,
                                                   T1_Je = T1_Je ~ (T1_J - 5.5) / 5.5,
                                                   U_Je = U_Je ~ (U_J - 14.5) / 14.5,
                                                   U_Ie = U_Ie ~ (U_I - 14.5) / 14.5,
                                                   T2_Ie = T2_Ie ~ (T2_I - 5.5) / 5.5,
                                                   T2_Je = T2_Je ~ (T2_J - 5.5) / 5.5,
                                                   U1_Ie = U1_Ie ~ (U1_I - 14.5) / 14.5,
                                                   OMPe = OMPe ~ (OMP - 0.5) / 0.5,
                                                   SCRe = SCRe ~ (SCR - 0.5) / 0.5,
                                                   VEC1e = VEC1e ~ (VEC1 - 0.5) / 0.5,
                                                   VEC2e = VEC2e ~ (VEC2 - 0.5) / 0.5,
                                                   RT_Ie = RT_Ie ~ (RT_I - 2.5) / 2.5,
                                                   RT_Je = RT_Je ~ (RT_J - 2.5) / 2.5))

reg <- km(design = coded_sampled_design,
          response = sampled_results_data$cost_mean)

reg

3 2020

3.1 February

3.1.1 [2020-02-05 Wed]

3.1.1.1 Review for COMCOM

The paper presents the ε-sticky algorithm, an extension of the ε-greedy algorithm for the Multi-Armed Bandit problem to the selection of Access Points by user stations, and evaluates the performance of the proposed approach in two scenarios, varying in the distribution of user stations. The proposed approach presents significant improvements in user station satisfaction, in relation to a Strongest Signal Access Point selection algorithm. The paper significantly extends the authors’ previous work, providing a brief introduction to the Multi-Armed Bandit problem and its different types, and a significantly more extensive evaluation and validation of the proposed approach. The experimental methodology is solid, and the proposed approach convincingly performs better than a Strongest Signal selection algorithm, in all scenarios studied. The study of the best values of ε was also interesting, since choosing a good value for the parameter has a significant impact on the observed throughput.

3.1.1.2 Applications of Program Autotuning: Call for Proposals for Funding for Undergraduates, Masters, and PhD Students

We are glad to announce a call for masters and PhD scholarship proposals for a research project, done in the context of a collaboration between the University of São Paulo and the Hewlett Packard Enterprise company, starting on February 2020.

The work to be performed by candidates will require 10 hours per week, and will involve the application of techniques for statistical optimization and design of experiments to a problem presented by the candidate. We are looking for students interested in optimizing their applications, or applications used in their work, targeting different metrics, such as performance, memory or network usage.

Ideal problems for this project include problems where the search spaces involved are too large to be explored exhaustively, and have no trivial configuration. Examples of problems include the selection of hyperparameters of Machine Learning algorithms to increase prediction accuracy, selecting compiler flags to improve performance, and configuring user-developed applications to improve user-defined metrics.

We would like students who have already stated their research problems and have had some progress towards measuring and evaluating their objectives, but highly motivated students who are just starting are also welcome to apply.

Up to 5 accepted candidates will receive funding, comparable to scholarships from CAPES, payed by IME/USP for eight months, between February and September 2020. Students currently without funding are preferred.

We invite interested students to submit a CV and a cover letter, summarizing their problem and their interest in applying optimization techniques to it, by February 20th 2020. We assume your advisor is aware and consents with your participation in this project. Accepted candidates will be invited to a two-day workshop at IME/USP, where techniques and libraries for statistical optimization and design of experiments will be presented, and candidates will be asked to give short presentations about their work. Final acceptance notification will be sent by February 29th.

Project Coordinator: Prof. Dr. Alfredo Goldman Project Manager: Pedro Bruel

3.1.2 [2020-02-06 Thu]

3.1.2.1 Autotuning for GCC

3.1.2.1.1 Some References
3.1.2.1.2 Ideas
3.1.2.1.2.1 Benchmark
  • Ofast
    • unsafe optim
    • error rate
3.1.2.1.2.2 Search Space
  • Focus on flags at first:
    • Explore numerical, categorical parameters later
    • LTO
  • Which flags should be chosen?
  • Baselines:
    • O0, …, O3, (Ofast, …?)
    • Random sample of flag configurations
3.1.2.1.2.3 Metrics / Search Objectives
  • Performance, memory, binary size, …
  • Focus on one metric first
    • Pareto frontier of the others
  • We could try something like a normalized, weighted sum of metrics
3.1.2.1.2.4 Experiments and Analysis Plan
  • Plan a screening experiment: Identify Main Effects
  • Leverage expert knowledge plus screening to plan next experiments
  • Low-discrepancy sampling, linear model, ANOVA
  • Gaussian Process Regression?
3.1.2.1.3 Plan
  1. Define a benchmark
    • Firefox
    • SPEC
    • BLAS Lapack
  2. Define metrics
  3. Define the search space
    1. Flags
    2. Others
  4. Start experiments
    1. Random sample
    2. Screening
    3. Linear models + ANOVA
    4. Linear models + ANOVA + Optimal Design
    5. Gaussian Process Regression

3.1.3 [2020-02-13 Thu]

3.1.3.1 Experimental Design for a 2D Problem

3.1.3.1.1 Defining a Problem

Consider the following function of two factors $(x_1, x_2)$:

\[ f(x_1, x_2) + ε = 2.3 + (1.1x_1) + (1.6x12) + (2.2x_2) + (3.2x22) + ε \]

That is, $f(x_1, x_2)$ has linear and quadratic dependence on $(x_1, x_2)$. Here, ε is a normally distributed added error with mean zero and variance one.

In R, we can write $f$ as:

f <- function(x1, x2) {
  return(2.3 + (1.1 * x1) + (1.6 * x1 * x1) +
         (2.2 * x2) + (3.2 * x2 * x2) + (2.7 * x1 * x2) + rnorm(length(x1)))
}

# Measuring 5 values of f(x1, x2)
f(rnorm(5), rnorm(5))

The inputs of $f$ are defined to be real numbers in the interval $[-1.0, 1.0]$. Since we cannot generate the set of all possible inputs of $f$, we have to set for a sample of limited size. Say that we cover the $(x_1, x_2)$ space with a grid of points in $[-1.0, 1.0]$, spaced by $0.05$ in each axis. In R, we can write:

resolution <- 0.05

sample_grid <- expand.grid(x1 = seq(-1.0, 1.0, by = resolution),
                           x2 = seq(-1.0, 1.0, by = resolution))

sample_grid$f <- f(sample_grid$x1, sample_grid$x2)

str(sample_grid)

Plotting this low-resolution view the space, we get:

library(lattice)
library(RColorBrewer)

colors = colorRampPalette(brewer.pal(11, "RdYlBu"))(69)

wireframe(f ~ x1 * x2,
          data = sample_grid,
          xlab = "x1",
          ylab = "x2",
          zlab = list("f(x1,x2)",
                      rot = "90"),
          zlim = range(seq(min(sample_grid$f) - 2.0, max(sample_grid$f) + 2.0, by = 0.5)),
          colorkey = FALSE,
          col.regions = colors,
          drape = TRUE,
          #shade = TRUE,
          lattice.options = lattice.options(list(border = FALSE)),
          scales = list(arrows = FALSE),
          screen = list(z = 20, x = -60, y = 0),
          par.settings = list(axis.line = list(col = "transparent")))
library(dplyr)
library(AlgDesign)

budget <- 200
sample_rs <- sample_n(sample_grid, budget)

sample_rs$method <- "Random Design"
exploration <- sample_rs

sample_linear_model <- optFederov(~ x1 + x2 + x1:x2,
                                  data = sample_grid,
                                  nTrials = budget)$design
sample_linear_model$method <- "Linear Model Design"
exploration <- bind_rows(exploration,
                         sample_linear_model)

sample_quadratic_model <- optFederov(~ (x1 + x2) + (x1:x2) + I(x1 ^ 2) + I(x2 ^ 2),
                                     data = sample_grid,
                                     nTrials = budget)$design

sample_quadratic_model$method <- "Quadratic Model Design"
exploration <- bind_rows(exploration,
                         sample_quadratic_model)

str(exploration)
library(ggplot2)

ggplot(data = exploration, aes(x = x1, y = x2)) +
  facet_wrap(method ~ ., ncol = 3) +
  geom_point(size = 3) +
  theme_bw(base_size = 22)
3.1.3.1.2 Fitting and Predicting Values

We now remember the definition of $f$:

\[ f(x_1, x_2) = 2.3 + (1.1x_1) + (1.6x12) + (2.2x_2) + (3.2x22) + ε \]

If we try to obtain a regression function for $f$, testing the quadratic hypotheses using the different designs, we would get:

rs_lm <- lm(f ~ x1 * x2 * I(x1 ^ 2) * I(x2 ^ 2), data = filter(exploration, method == "Random Design"))
lin_lm <- lm(f ~ x1 * x2 * I(x1 ^ 2) * I(x2 ^ 2), data = filter(exploration, method == "Linear Model Design"))
quad_lm <- lm(f ~ x1 * x2 * I(x1 ^ 2) * I(x2 ^ 2), data = filter(exploration, method == "Quadratic Model Design"))

coef(rs_lm)
coef(lin_lm)
coef(quad_lm)

rs_lm <- aov(f ~ x1 * x2 + I(x1 ^ 2) + I(x2 ^ 2), data = filter(exploration, method == "Random Design"))
quad_lm <- aov(f ~ x1 * x2 + I(x1 ^ 2) + I(x2 ^ 2), data = filter(exploration, method == "Quadratic Model Design"))
summary.aov(rs_lm)

prediction_rs <- predict(rs_lm, newdata = select(sample_grid, -f))
prediction_error <- data.frame(error = sample_grid[prediction_rs == min(prediction_rs), ]$f - min(prediction_rs))
prediction_error$method <- "Random Design"

prediction_lin <- predict(lin_lm, newdata = select(sample_grid, -f))
prediction_error_lin <- data.frame(error = sample_grid[prediction_lin == min(prediction_lin), ]$f - min(prediction_lin))
prediction_error_lin$method <- "Linear Model Design"

prediction_error <- bind_rows(prediction_error,
                              prediction_error_lin)

prediction_quad <- predict(quad_lm, newdata = select(sample_grid, -f))
prediction_error_quad <- data.frame(error = sample_grid[prediction_quad == min(prediction_quad), ]$f - min(prediction_quad))
prediction_error_quad$method <- "Quadratic Model Design"

prediction_error <- bind_rows(prediction_error,
                              prediction_error_quad)

prediction_error
3.1.3.1.3 Overview of Autotuning Methods

3.1.4 [2020-02-18 Tue]

3.1.4.1 Functions in R

my_function <- function(x, y, z) {
  x + y
  paste(z, collapse = "", sep = "")
}

my_function(2, 3, c("a", "v", "c"))
3.1.4.1.1 Manifesto pela Reprodutibilidade da Ciência da Computação no Brasil

Conclusões produzidas a partir de dados obtidos experimentalmente não podem ser consideradas validadas até que seja possível reproduzi-las em condições experimentais independentes. Esse princípio orienta todo o progresso científico baseado em metodologias experimentais. A pesquisa experimental em Ciência da Computação está em posição singular para reforçar e promover a reprodutibilidade científica, pois experimentos computacionais em determinados casos podem ser acompanhados, registrados, e repetidos com precisão e controle praticamente impossíveis em áreas como a biologia e a química. As organizações em prol da ciência brasileira têm portanto grandes justificativas para promover e reforçar a reprodutibilidade científica.

Os esforços da ACM são um bom exemplo dos esforços iniciais que podem ser realizados em prol da reprodutibilidade. Diversas conferências e periódicos da ACM adotam um sistema de insígnias para marcar trabalhos cujos esforços para reprodutibilidade de seus experimentos são significativos. A nomenclatura utilizada pela ACM é derivada do Vocabulário Internacional de Metrologia (VIM), e distingue entre resultados e conclusões que podem ser reproduzidos:

  • Pela mesma equipe, nas mesmas condições experimentais (Repetibilidade)
  • Por uma equipe diferente, nas mesmas condições experimentais (Replicabilidade)
  • Por uma equipe diferente, em condições experimentais diferentes (Reprodutibilidade)

O código e os dados que dão suporte às conclusões de um estudo científico devem ser submetidos junto ao documento que será publicado. Esses artefatos são avaliados pelos revisores e insígnias são conferidas de acordo com o nível de reprodutibilidade alcançado. Outras organizações também promovem a reprodutibilidade, como a ReScience, que recentemente promoveu o Desafio de 10 Anos da Reprodutibilidade, onde pesquisadores foram incentivados a submeter artigos com a reprodução de seus próprios resultados de no mínimo 10 anos de idade.

Ações como as tomadas pela ACM podem ter um grande impacto no reforço da credibilidade do método científico e no avanço da descoberta científica no Brasil.

3.2 March

3.2.1 [2020-03-11 Wed]

3.2.1.1 Review for ICS2020

3.2.1.1.1 Summary

The paper presents an interesting study of the performance of a redundant operation detection tool, CIDetector. The tool is used to identify regions in the machine code generated by a compiler that contain redundant operations. These regions are then modified to remove redundancies, and the percentage of redundancy reduction and the resulting speedups are reported. The paper is clearly structured and easy to follow.

The paper evaluates the detection tool on 14 programs, composing the CIBenchmark, which is also introduced by the paper. CIDetector is tested on 3 GCC versions, 1 ICC, and 1 LLVM versions, by measuring the reduction on redundant operations, caused by changes in regions detected to produce these kinds of redundancies. In some scenarios, eliminating redundant operations produces execution time speedups.

Strangely, the paper does not provide access to any of its source code or data, but mentions that the code will be open-sourced provided the paper is accepted. On top of being a strange practice, this means I was not able to independently verify or validate any of the data or the code presented in this paper.

The paper does not discuss whether the results reported are a mean of a certain number of executions, and no standard deviation or confidence interval of the mean is presented. It is strongly suggested that these statistics and discussions are added to the paper, in order to strengthen its conclusions.

Overall, the paper presents valuable insights and careful evaluation and discussion of the results. I believe this to be a borderline paper, which could be accepted provided the statistical analysis methodology is clearly presented and discussed. I also believe it would be interesting to compare the performance of the code generated by different compilers.

3.2.2 [2020-03-13 Fri]

3.2.2.1 First results from Emanuel’s work

�[32ml�[32ml�[32mv�[32mm�[32m-�[32ma�[32ms�[39m < /dev/null | �[32mo�[32mp�[32mt�[39m -O2 -disable-output -debug-pass=Arguments�[?2004l
Pass Arguments:  -tti -tbaa -scoped-noalias -assumption-cache-tracker -targetlibinfo -verify -ee-instrument -simplifycfg -domtree -sroa -early-cse -lower-expect
Pass Arguments:  -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -called-value-propagation -attributor -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -basicaa -aa -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -libcalls-shrinkwrap -loops -branch-prob -block-freq -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -pgo-memop-opt -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -loop-rotate -licm -loop-unswitch -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -indvars -loop-idiom -loop-deletion -loop-unroll -mldst-motion -phi-values -basicaa -aa -memdep -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -gvn -phi-values -basicaa -aa -memdep -memcpyopt -sccp -demanded-bits -bdce -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -lazy-value-info -jump-threading -correlated-propagation -basicaa -aa -phi-values -memdep -dse -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -licm -postdomtree -adce -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -barrier -elim-avail-extern -basiccg -rpo-functionattrs -globalopt -globaldce -basiccg -globals-aa -float2int -domtree -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -loop-rotate -loop-accesses -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -loop-distribute -branch-prob -block-freq -scalar-evolution -basicaa -aa -loop-accesses -demanded-bits -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -loop-vectorize -loop-simplify -scalar-evolution -aa -loop-accesses -lazy-branch-prob -lazy-block-freq -loop-load-elim -basicaa -aa -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -domtree -loops -scalar-evolution -basicaa -aa -demanded-bits -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -slp-vectorizer -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -loop-unroll -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -licm -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -transform-warning -alignment-from-assumptions -strip-dead-prototypes -globaldce -constmerge -domtree -loops -branch-prob -block-freq -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -block-freq -loop-sink -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instsimplify -div-rem-pairs -simplifycfg -verify
Pass Arguments:  -domtree
Pass Arguments:  -targetlibinfo -domtree -loops -branch-prob -block-freq
Pass Arguments:  -targetlibinfo -domtree -loops -branch-prob -block-freq

3.2.3 [2020-03-16 Mon]

3.2.3.1 Tests

[1] "test"

:

[1] 3

:

[1] 3
test
4

3.2.4 [2020-03-17 Tue]

3.2.4.1 Map Intervals

3.2.4.2 “Circle Regression”

library(dplyr)
library(ggplot2)

map_intervals <- function(x, interval_from, interval_to) {
    new_x <- x - interval_from[1]
    new_x <- new_x / (interval_from[2] - interval_from[1])
    new_x <- new_x * (interval_to[2] - interval_to[1])
    new_x <- new_x + interval_to[1]

    return(new_x)
}

df <- data.frame(x = map_intervals(runif(100), c(0.0, 1.0), c(-1.0, 1.0)),
                 y = map_intervals(runif(100), c(0.0, 1.0), c(-1.0, 1.0)))

df$y <-

3.2.5 [2020-03-24 Tue]

3.2.5.1 Meeting May 24

3.2.6 [2020-03-27 Fri]

3.2.6.1 Design of Experiments Step-by-Step

Formulas in R:

y = (alpha * x1) + (beta * x2) y ~ x1 + x2

y = (α_1 * x1) + (α_2 * x2) + (α_3 * x1 * x2) y ~ x1 * x2

y = (α_3 * x1 * x2) y ~ x1:x2

y = (α_1 * x1) + (α_2 * x2) + (α_3 * x1 * x2)

:

Loading required package: SparseM

:

Attaching package: ‘SparseM’

:

The following object is masked from ‘package:base’:

:

    backsolve

3.3 April

3.3.1 [2020-04-14 Tue]

3.3.1.1 Reunião HPE Fin

  • Gastos de fevereiro?
  • Bolsa:
    • Não é um pedido extra
  • Hardware: problemas de covid
    • Acesso aos alunos: problema de segurança

3.3.2 [2020-04-17 Fri]

3.3.2.1 Sobol and Uniform samples

library(randtoolbox)
library(dplyr)
library(tidyr)

map_intervals <- function(x, interval_from, interval_to) {
    new_x <- x - interval_from[1]
    new_x <- new_x / (interval_from[2] - interval_from[1])
    new_x <- new_x * (interval_to[2] - interval_to[1])
    new_x <- new_x + interval_to[1]

    return(new_x)
}

n = 216
d = 108

temp_sobol <- torus(n = n,
                    dim = d,
#                     scrambling = 1,
#                     seed = as.integer((99999 - 10000) * runif(1) + 10000),
                    init = TRUE)
rm(temp_sobol)

design <- data.frame(halton(n = n,
                           dim = d,
                           # scrambling = 2,
                           # seed = as.integer((99999 - 10000) * runif(1) + 10000),
                           init = FALSE)) %>%
    map_intervals(c(0.0, 1.0), c(1.0, 8.0)) %>%
    round()

design$method <- "Sobol"

rs_design <- data.frame(matrix(runif(n * 108), ncol = d, nrow = n)) %>%
    map_intervals(c(0.0, 1.0), c(1.0, 8.0)) %>%
    round()

rs_design$method <- "Uniform"

design <- bind_rows(design, rs_design) %>%
    pivot_longer(-method, names_to = "Layer", values_to = "Bitwidth")

ggplot() +
    geom_histogram(data = design,
                   bins = 8,
                   aes(x = Bitwidth,
                       y = ..count..)) +
    facet_wrap(. ~ method, ncol = 1)

3.3.2.2 Looking at Top500 Data

3.3.2.2.1 Cloning Repository
�[?2004l
fatal: destination path 'top500' already exists and is not an empty directory.
Already up to date.
3.3.2.2.2 Loading Data
3.3.2.2.3 Looking at Data
3.3.2.2.3.1 Column Names
'data.frame':	27000 obs. of  52 variables:
 $ Year                           : int  1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 ...
 $ Month                          : int  6 6 6 6 6 6 6 6 6 6 ...
 $ Day                            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Rank                           : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Site                           : Factor w/ 2467 levels " Institute of Information and Communication Technologies at the Bulgarian Academy of Sciences",..: 1287 1379 1513 1486 1516 135 1507 277 468 563 ...
 $ Manufacturer                   : Factor w/ 147 levels "Acer Group","ACTION",..: 141 141 141 141 98 98 141 76 28 28 ...
 $ Computer                       : Factor w/ 2825 levels " eServer pSeries 655 (1.7 GHz Power4+)",..: 790 799 798 798 2345 2344 796 995 2730 2730 ...
 $ Country                        : Factor w/ 60 levels "Australia","Austria",..: 58 58 58 58 26 7 58 58 58 58 ...
 $ Processors                     : num  1024 544 512 512 4 ...
 $ RMax                           : num  59.7 30.4 30.4 30.4 23.2 20 15.1 13.9 13.7 13.7 ...
 $ RPeak                          : num  131 69.6 65.5 65.5 25.6 ...
 $ Nmax                           : num  52224 36864 36864 36864 6400 ...
 $ Nhalf                          : num  24064 16384 16384 16384 830 ...
 $ Processor.Family               : Factor w/ 26 levels "","Alpha","AMD",..: 25 25 25 25 21 21 25 14 7 7 ...
 $ Processor                      : Factor w/ 463 levels "Alpha","Alpha EV4",..: 267 267 267 267 133 133 267 63 25 25 ...
 $ Processor.Speed..MHz.          : num  32 32 32 32 400 ...
 $ System.Family                  : Factor w/ 179 levels " IBM Power Systems",..: 174 174 174 174 122 122 174 104 33 33 ...
 $ Operating.System               : Factor w/ 85 levels "AIX","Amazon Linux 2",..: 10 10 10 10 59 59 10 30 74 74 ...
 $ Architecture                   : Factor w/ 6 levels "Cluster","Constellations",..: 3 3 3 3 6 6 3 3 6 6 ...
 $ Segment                        : Factor w/ 7 levels "Academic","Classified",..: 6 4 1 2 7 6 6 1 7 6 ...
 $ Application.Area               : Factor w/ 48 levels "","Aerospace",..: 37 37 37 37 37 46 37 37 37 37 ...
 $ Interconnect.Family            : Factor w/ 27 levels "10G","Cray Interconnect",..: 8 8 8 8 3 3 8 17 17 17 ...
 $ Interconnect                   : Factor w/ 90 levels "100G Ethernet",..: 37 37 37 37 67 67 37 71 71 71 ...
 $ Region                         : Factor w/ 16 levels "Australia and New Zealand",..: 6 6 6 6 4 6 6 6 6 6 ...
 $ Continent                      : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 3 2 2 2 2 2 ...
 $ Power                          : num  NA NA NA NA NA NA NA NA NA NA ...
 $ System.Model                   : Factor w/ 539 levels "","Acer AR585 F1 Cluster",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Total.Cores                    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Measured.Size                  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Processor.Cores                : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Accelerator                    : Factor w/ 7 levels "","ATI GPU","IBM PowerXCell 8i",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Name                           : Factor w/ 848 levels "","&#346;wierk Computing Centre",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Accelerator.Cores              : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Efficiency....                 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Mflops.Watt                    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Processor.Technology           : Factor w/ 26 levels "","AMD x86_64",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ OS.Family                      : Factor w/ 6 levels "","BSD Based",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Cores.per.Socket               : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Processor.Generation           : Factor w/ 75 levels "","AMD Naples",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Previous.Rank                  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ First.Appearance               : num  NA NA NA NA NA NA NA NA NA NA ...
 $ First.Rank                     : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Accelerator.Co.Processor.Cores : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Accelerator.Co.Processor       : Factor w/ 59 levels "","AMD FirePro S10000",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Power.Source                   : Factor w/ 4 levels "","Derived","Optimized",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Rmax..TFlop.s.                 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Rpeak..TFlop.s.                : num  NA NA NA NA NA NA NA NA NA NA ...
 $ HPCG..TFlop.s.                 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Power..kW.                     : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Power.Effeciency..GFlops.Watts.: num  NA NA NA NA NA NA NA NA NA NA ...
 $ Site.ID                        : num  NA NA NA NA NA NA NA NA NA NA ...
 $ System.ID                      : int  NA NA NA NA NA NA NA NA NA NA ...
3.3.2.2.3.2 Processor Speed
library(ggplot2)

ggplot() +
    geom_jitter(data = df,
                alpha = 0.5,
                height = 0.0,
                size = 1.5,
                aes(x = Year,
                    y = Processor.Speed..MHz. / 1000,
                    color = cut(Rank,
                                breaks = c(1, 167, 334, 500),
                                include.lowest = TRUE))) +
                                        #scale_y_log10() +
    scale_x_continuous(breaks = function(x) { seq(floor(min(x)),
                                                  ceiling(max(x)),
                                                  4) }) +
    scale_color_brewer(name = "TOP500 Rank", palette = "Set1") +
    ylab("Processor Speed (GHz)") +
    theme_bw(base_size = 27) +
    theme(legend.position = c(0.25, 0.95),
          legend.direction = "horizontal",
          legend.background = element_rect(fill = "transparent", colour = NA),
          legend.text = element_text(size = 15),
          legend.title = element_text(size = 15)) +
    guides(color = guide_legend(nrow = 3, override.aes = list(alpha = 1.0, size = 4)))
3.3.2.2.3.3 NMax
library(ggplot2)

ggplot() +
    geom_jitter(data = df,
                alpha = 0.5,
                height = 0.0,
                size = 1.5,
                aes(x = Year,
                    y = Nmax,
                    color = cut(Rank,
                                breaks = c(1, 167, 334, 500),
                                include.lowest = TRUE))) +
    scale_x_continuous(breaks = function(x) { seq(floor(min(x)),
                                                  ceiling(max(x)),
                                                  4) }) +
    scale_color_brewer(name = "TOP500 Rank", palette = "Set1") +
    ylab("Problem Size to Reach RMax") +
    scale_y_log10() +
    theme_bw(base_size = 27) +
    theme(legend.position = c(0.25, 0.95),
          legend.direction = "horizontal",
          legend.background = element_rect(fill = "transparent", colour = NA),
          legend.text = element_text(size = 15),
          legend.title = element_text(size = 15),
          axis.text.y = element_text(angle = 90, hjust = 0.5)) +
    guides(color = guide_legend(nrow = 3, override.aes = list(alpha = 1.0, size = 4)))
3.3.2.2.3.4 RPeak and RMax
library(ggplot2)

plot_df <- df %>%
    mutate(RMax = RMax / 1e3,
           RPeak = RPeak / 1e3,
           RMaxT = coalesce(RMax, Rmax..TFlop.s.),
           RPeakT = coalesce(RPeak, Rpeak..TFlop.s.)) %>%
    select(Rank,
           Year,
           RMaxT,
           RPeakT,
           HPCG..TFlop.s.) %>%
    gather(-Rank, -Year, key = "Type", value = "Count") %>%
    mutate(Type = factor(Type,
                         levels = c("RPeakT",
                                    "RMaxT",
                                    "HPCG..TFlop.s."),
                         labels = c("RPeak (HPL)",
                                    "RMax (HPL)",
                                    "RMax (HPCG)"))) %>%
    filter(is.finite(Count))

ggplot() +
    geom_jitter(data = plot_df,
                alpha = 0.5,
                height = 0.0,
                size = 1.5,
                aes(x = Year,
                    y = Count,
                    color = cut(Rank,
                                breaks = c(1, 167, 334, 500),
                                include.lowest = TRUE))) +
    scale_x_continuous(breaks = function(x) { seq(floor(min(x)),
                                                  ceiling(max(x)),
                                                  6) }) +
    scale_color_brewer(name = "TOP500 Rank", palette = "Set1") +
    ylab("Performance (TFlops/s)") +
    scale_y_log10() +
    theme_bw(base_size = 27) +
    theme(legend.position = c(0.83, 0.09),
          legend.direction = "horizontal",
          legend.background = element_rect(fill = "transparent", colour = NA),
          legend.text = element_text(size = 15),
          legend.title = element_text(size = 15),
          axis.text.y = element_text(angle = 90, hjust = 0.5)) +
    guides(color = guide_legend(nrow = 3, override.aes = list(alpha = 1.0, size = 4))) +
    facet_wrap(. ~ Type, ncol = 3)

3.3.3 [2020-04-27 Mon]

3.3.3.1 KL-Exchange Notes

3.4 May

3.4.1 [2020-05-16 Sat]

3.4.1.1 CSmith Histogram

library(dplyr)
library(ggplot2)

df <- read.csv("data/0e/ce81f1-5ade-41e4-b217-c0d157e93922/100_depth.csv")

ggplot(df) +
    geom_col(aes(x = depth,
                 y = occurrences)) +
    theme_bw(base_size = 28)
library(dplyr)
library(ggplot2)

df <- read.csv("data/0e/ce81f1-5ade-41e4-b217-c0d157e93922/100_depth.csv")

ggplot(df, aes(x = depth,
               y = occurrences)) +
    geom_point() +
    geom_smooth(formula = y ~ I(1 / x),
                method = "lm") +
    theme_bw(base_size = 28)

The paper compares the performance of Julia with that of the pthreads C API using atomic operations and simple synchronization tasks, and also compares the performance of Julia and that of the MPI C++ API in microbenchmarks for communication, reads, and writes.

The paper presents an implementation of HPCG, a standard, communication-heavy, and large scale benchmark for HPC.

The paper argues that the Julia HPCG implementation outperforms the C++ MPI implementation in certain experimental conditions, and that Julia is capable of performing certain synchronization operations faster than the pthreads API.

3.4.1.1.1 Strengths and Weaknesses
3.4.1.1.1.1 Strengths

The paper is clearly written and easy to read. The paper reports statistical analysis choices and experiment and experimental design, such as the number of repeated measurements and under which criteria were some outliers removed. Interesting discussions are made in the paper, such as regarding be the underlying causes of each observation, and some of the implementation choices for HPCG.

3.4.1.1.1.2 Weaknesses

The paper does not make available any artifacts for the verification of data, code, or statistical analysis, mentioning these will be made available after acceptance. Under these conditions it is impossible to verify or attempt to reproduce results. The claims that the Julia language is capable of outperforming the C++ MPI implementation on HPCG seem rushed, given the data presented.

3.4.1.1.2 Detailed Comments

I believe that a more comprehensive experimental analysis of the performance of Julia in parallel and distributed settings should be performed to be able to claim improvements over well established technologies such as MPI and pthreads.

Although motivating and interesting, the microbenchmarks and small-scale threading comparisons do not provide a solid basis for claiming improvements, and the same is valid for HPCG. Further work should aim to compile a comprehensive set of large-scale, possibly real-world, applications written in Julia and C++ MPI.

The paper mentions that the variance of the HPCG measurements were large on the Julia code. It would enrich and better support the discussion to add the confidence intervals of these measurements, and for the other graphs also presented. It is usual to compute an interval of twice the standard deviation of the mean, in order to obtain an approximately 95% confidence interval.

3.4.1.1.3 Suggestions for Improvement
  • Add the 95% confidence intervals for the mean, on all plots, but especially for Figure 10
  • Perform performance evaluations on a more comprehensive benchmark, in order to better support claims of improvement
  • “Figure III-A shows the results.” ~> Should be “Figure 4”, at page 6
  • “perfromance” ~> “performance”, at page 10

3.4.2 [2020-07-20 Mon]

3.4.2.1 Extension Request

The experimental validation of the approaches studied during this thesis, namely Design of Experiments and Gaussian Process Regression, took longer than expected in the initial SPAPT benchmark. The large set of problems and number of configurable parameters slowed down a comprehensive and careful analysis of the relationships between parameters and performance for many problems.

The Gaussian Process Regression approach introduced further questions and restrictions for experimental validation, and the method was also applied to a different domain, the selection of bit precision for layers of Deep Neural Networks, where completing a single experimental run takes a full week.

I have already started my efforts towards writing the thesis, but our current assessment of the state of the work indicates that I most likely will not be able to finish before the end of the third year’s limit date.

Compounding the experimental difficulties sustained during the last year, the difficulties generated from the COVID-19 pandemic greatly hindered the pace of my work, as they did for many others. I continue my cotutelle from Brazil, a country that is at the moment the epicenter of the current pandemic. During the final stages of my thesis, I had to relocate to a different city and

3.4.3 [2020-07-25 Sat]

3.4.3.1 Review for Fabio (Middleware 2020)

3.4.3.1.1 Paper #72

Title: ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics

3.4.3.1.1.1 A. Overall merit
  1. Weak reject (I don’t want to publish this)
3.4.3.1.1.2 B. Reviewer expertise
  1. I have published/worked on most/all this paper’s topics
3.4.3.1.1.3 C. Paper summary

The paper presents an autotuner for 44 parameters of the Spark data analytics framework. The optimization approach is a composite of Random Forests to identify the most significant parameters, Latin Hypercube Sampling to better cover the search space, and Gaussian Process Regression to determine configurations to test, balancing exploitation of detected relationships and exploration of the search space. The paper performs experimental evaluation on five SparkBench workloads with three datasets each. Although the proposed approach achieves small but statistically significant improvements in relation to the optimization methods compared in the study, the paper does not present performance or optimization baselines. The paper does not compare the achieved performance to the performance of default, or sensibly picked, Spark configurations. The results of the optimizers are also not compared to the best configurations found by naive methods such as Random Search.

3.4.3.1.1.4 D. Strengths
  • Adapts well-established optimization techniques from Bayesian Optimization and Statistical Learning to the Spark tuning problem, exploring better-performing regions of search spaces and decreasing tuning costs with respect to other tuning strategies
3.4.3.1.1.5 E. Weaknesses
  • Proposes an interesting high-dimensional optimization problem, with around 200 dimensions, but the actual studied problem targets a more tractable search space, one order of magnitude smaller, with only 44 parameters. The paper does not mention the number of possible configurations
  • Performance improvements over established work seem to be statistically significant although small, but the paper does not compare the results with proper baselines. Performance is never compared to the sensible default Spark configurations for each workload, or to optimizations from more naive methods
3.4.3.1.1.6 F. Comments for author
3.4.3.1.1.6.1 Major Comments
  • The paper does not compare the performance achieved by autotuning strategies to the performance achieved by more naive optimization strategies using the same budget, such as Random Search, which could also be based on LHS. Without optimization and performance baselines it is difficult to measure the effectiveness of the autotuners
  • The comparisons between the proposed system and others are not balanced. The study allows the proposed system to reuse results from previous runs in two of the three datasets for each problem, which increases the effective experimental budget and the optimization time used by the technique. The author should modify figures to reflect this imbalance
  • Page 8: “For ROBOTune, we treat each workload to be an unseen one for the first dataset, and for the remaining two, ROBOTune uses cached parameters and configurations through memoized sampling.”
    • This means that the optimization time and budget comparisons presented in Figures 4 and 6 are not balanced between methods. If the proposed approach reuses measurements, its optimization costs must be updated accordingly
3.4.3.1.1.6.2 Minor Comments
  • Page 3: “Eventually, it [Bayesian Optimization] locates the global extremum for the objective function, or nearly so within a limited number of observations.”
    • Finding the global optimum in a limited number of measurements is not a guarantee of Bayesian Optimization
  • Page 5: “Furthermore, LHS is dimension agnostic, meaning that the number of samples required is not tied to the dimensionality of the configuration space.”
    • Obtaining an effective search space covering using Latin Hypercube Sampling is tied to the problem dimension. A sample with only a handful of LHS samples in high-dimensional spaces suffers from the same issues than uniform samples, that is, most samples would be located in the grid regions near the “shell” of the search space
  • Wrong date on page headers: “Conference’17, July 2017”
  • Page 5: “Another way of computing feature importances is through tree-based estimators. Unlike linear models, they apply to non-linear relationships.”
    • Linear models can represent non-linear factor relationships, and are only restricted in the sense of linear, or linearizable, relationships between the estimated parameters
  • Page 5: “We compare the coefficient of determination (R2) for two linear and two tree-based models in Figure 2.”
    • It is hard to compare the qualities of fit of the models in Figure 2 without knowing the underlying models for the Lasso and the Lasso + Ridge (ENet) approaches, since adding polynomial terms would increase flexibility of these approaches and consequently increase R^2
  • Reference could be improved adding publisher information: “[11] L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. 1984.Classification and Regression Trees. Taylor & Francis.https://books.google.com/books?id=JwQx-WOmSyQC
3.4.3.1.1.7 G. Comments for PC
3.4.3.1.2 Final Opinion

The response did not address my most concerning comments satisfactorily, and I recommend rejecting the paper. My comments regard mostly the experimental validation of the approach. Detailed comments are listed below.

3.4.3.1.2.1 Comments on the Rebuttal

> Comment-72E-2: Comparison to the default/naive? > > We compared our results to the default in Section-5.2. It resulted in frequent > Out-Of-Memory/Runtime errors. We need 5 runs for each dataset, each run taking > up to ~10 hours. Due to time constraints we chose to focus on established > methods.

A fair comparison baseline would be a configuration considered good enough for each dataset, that is, a configuration that was used in other work, or that is well established in the industry for this type of data, or at least that does not result in frequent errors. An equivalent example would be the Spark-equivalent of the “-O3” compiler option for the GCC compiler, which is a common baseline for works on compiler optimization. In that way, gains over sensible defaults could be properly assessed. Another fair option for a baseline would be picking the best configuration found by a random search using the same experimental budget. The paper should be resubmitted after sufficient data is collected.

> Comment-72E-3: Comparison not balanced. > > This could be a misunderstanding. In Section-5.3 we exclude the initial one-time > cost of parameter selection as this is a fixed cost. This overhead varies with > the number of datasets tuned, 3 used in our evaluation. Using more (5 or 10) > will change the overhead.

In my previous comment I argued under the impression that the proposed method leveraged cached results from previous runs. If that is the case, the comparisons between approaches must consider the experimental cost of caching these initial runs as part of the optimization cost. The cost of these initial evaluations, that are then cached and reused, should be included in the total optimization cost of the approach. If the method does not in fact reuse cached results in further optimization, please ignore my comment.

3.5 September

3.5.1 [2020-09-13 Sun]

3.5.1.1 Running GPR on Steven’s Data

3.5.1.1.1 Loading Data for Analysis
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
'data.frame':	23120 obs. of  9 variables:
 $ elements_number   : int  3 2 4 2 2 2 2 4 4 3 ...
 $ y_component_number: int  3 2 1 1 1 2 2 2 4 1 ...
 $ vector_length     : int  4 1 4 1 8 2 1 8 16 4 ...
 $ temporary_size    : int  4 2 2 2 2 2 4 4 2 4 ...
 $ load_overlap      : num  2 1 2 1 2 2 1 2 2 2 ...
 $ threads_number    : int  64 128 64 256 128 128 128 64 128 32 ...
 $ lws_y             : int  64 1 32 64 32 8 2 2 128 32 ...
 $ time_per_pixel    : num  1.11e-08 1.58e-10 2.34e-09 1.39e-09 3.40e-09 ...
 $ row_number        : int  1 2 3 4 5 6 7 8 9 10 ...
3.5.1.1.2 GPR Code
'data.frame':	23120 obs. of  9 variables:
 $ elements_number   : int  3 2 4 2 2 2 2 4 4 3 ...
 $ y_component_number: int  3 2 1 1 1 2 2 2 4 1 ...
 $ vector_length     : int  4 1 4 1 8 2 1 8 16 4 ...
 $ temporary_size    : int  4 2 2 2 2 2 4 4 2 4 ...
 $ load_overlap      : num  2 1 2 1 2 2 1 2 2 2 ...
 $ threads_number    : int  64 128 64 256 128 128 128 64 128 32 ...
 $ lws_y             : int  64 1 32 64 32 8 2 2 128 32 ...
 $ time_per_pixel    : num  1.11e-08 1.58e-10 2.34e-09 1.39e-09 3.40e-09 ...
 $ row_number        : int  1 2 3 4 5 6 7 8 9 10 ...
[1] 18
Error in t.default(T) : argument is not a matrix
3.5.1.1.3 Sub-spaces Reached by each Method
library(GGally)
library(dplyr)
library(grid)
library(patchwork)
library(gridExtra)

gpr_data <- read.csv("dopt_anova_experiments/data/gpr_nugget_best_points.csv") %>%
    select(-slowdown, -time_per_pixel, -measurement_order,
           -experiment_id, -X, -row_number) %>%
    mutate(method = "GPRN")

# names(gpr_data) <- c("slowdown", "method", "point_number", "time_per_pixel")

gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_nosd_best_points.csv") %>%
    select(-slowdown, -time_per_pixel, -measurement_order,
           -experiment_id, -X, -row_number) %>%
    mutate(method = "GPR")

# names(gpr_lin_data) <- c("slowdown", "method", "point_number", "time_per_pixel")

df_all_methods <- read.csv("dopt_anova_experiments/data/complete_1000.csv",
                           strip.white = TRUE, header = TRUE) %>%
    select(-slowdown, -point_number, -vector_recompute, -time_per_pixel)

levels <- c("RS", "LHS", "GS", "GSR",
            "GA","LM", "LMB", "LMBT",
            "RQ", "DOPT", "DLM", "DLMT",
            "GPR", "GPRN")

selected_methods <- c("RS", "LHS", "GS", "GSR",
                      "GA", "LM", "DLMT", "GPR", "GPRN")

df_all_methods <- df_all_methods %>%
    mutate(load_overlap = as.numeric(as.factor(load_overlap))) %>%
    bind_rows(gpr_data, gpr_lin_data) %>%
    mutate(method = factor(method,
                           levels = levels)) %>%
    filter(method %in% selected_methods) %>%
    group_by(method) %>%
    ungroup()

gprn <- ggpairs(df_all_methods %>%
                filter(method == "GPRN") %>%
                select(-method)) +
    ggtitle("GPRN: Trend + Nugget")

gpr <- ggpairs(df_all_methods %>%
               filter(method == "GPR") %>%
               select(-method)) +
    ggtitle("GPR: Trend")

dlmt <- ggpairs(df_all_methods %>%
                filter(method == "DLMT") %>%
                select(-method)) +
    ggtitle("DLMT")

dlm <- ggpairs(df_all_methods %>%
               filter(method == "LM") %>%
               select(-method)) +
    ggtitle("LM")

p1 <- grid.grabExpr(print(dlm))
p2 <- grid.grabExpr(print(dlmt))
p3 <- grid.grabExpr(print(gpr))
p4 <- grid.grabExpr(print(gprn))

grid.arrange(p1, p2, p3, p4, ncol = 2)
3.5.1.1.4 Histogram
library(ggplot2)
library(dplyr)

#gpr_data <- read.csv("dopt_anova_experiments/data/gpr_nugget_best_points.csv") %>%
# gpr_data <- read.csv("gpr_dampening_sc30_d05_best_points.csv") %>%
#     select(slowdown, method, measurement_order, time_per_pixel) %>%
#     mutate(method = "GPRD")
#
# names(gpr_data) <- c("slowdown", "method", "point_number", "time_per_pixel")

#gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_nosd_best_points.csv") %>%
gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_03sd_nugget_best_points.csv") %>%
    select(slowdown, method, measurement_order, time_per_pixel) %>%
    mutate(method = "GPR")

names(gpr_lin_data) <- c("slowdown", "method", "point_number", "time_per_pixel")

df_all_methods <- read.csv("dopt_anova_experiments/data/complete_1000.csv",
                           strip.white = TRUE, header = TRUE)

levels <- c("RS", "LHS", "GS", "GSR", "GA", "LM", "RQ", "DLMT", "GPR")
labels <- c("RS", "LHS", "GS", "GSR", "GA", "LM", "QR", "DLMT", "GPR")

df_all_methods <- df_all_methods %>%
    select(slowdown, method, point_number, time_per_pixel) %>%
    bind_rows(gpr_lin_data) %>%
    mutate(method = factor(method,
                           levels = levels,
                           labels = labels)) %>%
    filter(method %in% labels) %>%
    group_by(method) %>%
    mutate(mean = mean(slowdown),
           median = median(slowdown),
           ci95 = 1.96 * sd(slowdown) / sqrt(n()),
           max = max(slowdown)) %>%
    ungroup()

ggplot(df_all_methods) +
    facet_grid(method ~ .) +
    theme_bw(base_size = 15) +
    coord_cartesian(xlim = c(.9, 4),
                    ylim = c(0, 1000)) +
    geom_histogram(aes(slowdown),
                   binwidth = .05,
                   fill = "gray48") +
    scale_y_continuous(breaks = c(0, 1000),
                       labels = c("0", "1000")) +
    geom_curve(aes(x = max + .1,
                   y = 500,
                   xend = max,
                   yend = 5),
               arrow = arrow(length = unit(0.05, "npc")),
               curvature = 0.3,
               stat = "unique") +
    geom_text(aes(x = max + .2,
                  y = 550,
                  label = "max"),
              stat = "unique") +
    geom_rect(aes(xmin = mean - ci95,
                  xmax = mean + ci95,
                  ymin = 0,
                  ymax = 1000,
                  fill = "red"),
              alpha = 0.3,
              stat = "unique") +
    geom_vline(aes(xintercept = median),
               color = "darkgreen",
               linetype = 3,
               stat = "unique") +
    geom_vline(aes(xintercept = mean),
               color = "red",
               linetype = 2,
               stat = "unique") +
    labs(y = "Count",
         x = "Slowdown compared to the optimal solution") +
    scale_fill_discrete(name = "",
                        breaks = c("red"),
                        labels = c("Mean error")) +
    ggtitle("") +
    theme(legend.position = "none",
          strip.background = element_rect(fill = "white"))
3.5.1.1.5 Final Table
library(xtable)
library(dplyr)

gpr_lin_data <- read.csv("dopt_anova_experiments/data/gpr_03sd_nugget_best_points.csv") %>%
    select(slowdown, method, measurement_order, time_per_pixel) %>%
    mutate(method = "GPR")

names(gpr_lin_data) <- c("slowdown", "method", "point_number", "time_per_pixel")

df_all_methods = read.csv("./dopt_anova_experiments/data/complete_1000.csv",
                          strip.white = T, header = T) %>%
    select(slowdown, method, point_number) %>%
    bind_rows(gpr_lin_data) %>%
    filter(method %in% c("RS", "LHS", "GS", "GSR", "GA", "LM",
                         "DLMT", "RQ", "GPR")) %>%
    mutate(method =  factor(method,
                            levels = c("RS", "LHS", "GS", "GSR", "GA", "LM",
                                       "RQ", "DLMT", "GPR"),
                            labels = c("Random Sampling (RS)",
                                       "Latin Hypercube Sampling (LHS)",
                                       "Greedy Search (GS)",
                                       "Greedy Search w. Restart (GSR)",
                                       "Genetic Algorithm (GA)",
                                       "Linear Model (LM)",
                                       "Quantile Regression (QR)",
                                       "D-Opt., Linear Model w. Transform (DLMT)",
                                       "Gaussian Process Regression w. EI (GPR)")))

summaries = df_all_methods %>%
    group_by(method) %>%
    summarise(method = method,
              mean_slowdown = mean(slowdown),
              min_slowdown = min(slowdown),
              max_slowdown = max(slowdown),
              mean_budget = mean(point_number),
              max_budget = max(point_number),
              .groups = "keep") %>%
    distinct() %>%
    ungroup()

cap <- paste("Slowdown and budget used by 7 optimization methods on",
             "the Laplacian Kernel, using a budget of 125 points with 1000 repetitions")

x <- xtable(summaries,
            caption = cap,
            digits = 2,
            label = "tab:gpu_laplacian_compare_budget")

align(x) <- xalign(x)
display(x) <- display(x)

header = paste(paste("Method", "Mean", "Min.", "Max.", "Mean", "Max.",
               sep = " & "), "\\\\ \n \\midrule \n")

super_header = paste("\\toprule \n",
                     "& \\multicolumn{3}{c}{Slowdown} & \\multicolumn{2}{c}{Budget}",
                     "\\\\ \n")

bottom = "\\bottomrule\n"

print(x,
      size = "\\small",
      booktabs = TRUE,
      include.rownames = FALSE,
      include.colnames = FALSE,
      hline.after = NULL,
      add.to.row = list(pos = as.list(c(-1,
                                        0,
                                        7,
                                        nrow(summaries))),
                        command = c(super_header,
                                    header,
                                    "\\rowcolor{red!25}",
                                    bottom)),
      math.style.exponents = TRUE,
      table.placement = "b",
      caption.placement = "top")

3.5.2 [2020-09-18 Fri]

3.5.2.1 R session sample

:

 [1] 11.025812 11.525666 10.855048 11.342663 11.620696 11.578011 10.151870
 [8] 10.311008 11.926957  9.512951
 num [1:10] 10.03 10.53 9.86 10.34 10.62 ...

3.5.3 [2020-09-24 Thu]

3.5.3.1 CI for Speedups

[1] 1.250000 1.253689 1.066590 1.476254

3.6 October

3.6.1 [2020-10-06 Tue]

3.6.1.1 Related work Pedro (from Arnaud)

3.6.1.1.1 Chat notes
3.6.1.1.2 Minimisation (f is known)

decrease with an appropriate rate, and subject to relatively mild assumptions, stochastic gradient descent converges almost surely to a global minimum

(momentum to avoid zig-zag, Backtracking line search so that objective strictly decreases, conjugate gradient or Newton if hessian, Nelder-Mead if derivatives cannot be computed

In the meta-heuristic taxonomy, SA is a single-state method and GA is a population method (in a sub-class along with PSO, ACO, et al, usually referred to as biologically-inspired meta-heuristics).

  • Constrained optimization (requires Lagrangian unless a variable change helps). Additional difficulty: avoiding to get stuck on the border and following it with ridiculous steps.

Convergence rate degrades with dimension and bad normalization.

3.6.1.1.3 Learning (f is not known and should be approximated from x samples)
  • statistical learning: optimize the loss or maximum likelihood for model parameters. In general seeks a good prediction “everywhere”. Tries to evaluate uncertainty on parameters and predictions.
  • “machine” learning: efficient optimization of the loss (i.e. Ex,θ[y_θ(x)-y(x)])

Statistical and Machine learning generally assume that the samples x are fixed inputs (observations vs. controled experiments), which raises difficulties regarding generalization/overfitting, discriminating causation from correlation, counterfactual analysis, etc.

3.6.1.1.4 Design of Experiments (find the “best” x to obtain a good model)

Assuming a model for f, propose a set of x to measure to obtain the best estimates for the model parameters θ and the prediction.

  • Linear models
    • Fractional Designs
    • Screening
    • D-optimal designs
  • Unconstrained models (gaussian process)
    • LHS
3.6.1.1.5 Online and Reinforcement Learning

Choose x_i depending on previous observations and obtain feedback. Same rules: interact with the world and get feedback, and try to find the “best” configuration.

  • Online machine learning: regret from 1 to T, no discount. Often relies on stochastic gradient.
    • Special simple case = Bandit
      • Stochastic: UCB/Thompson, log(T)
      • Adversarial: EXP3, sqrt(T)
    • Many variants Linear UCB, LinRel, Gaussian UCB
  • Reinforcement learning: optimize (infinite discounted) regret, generally through Markov Decision Process (i.e., dynamic programming).
    • Well studied for finite-state space MDPs
    • Does not scale well in high dimension

In both cases, the balance between exploration and exploitation is lead by (temporally aggregated) regret.

3.6.1.1.6 Fast exploration of expensive black-box functions
  • Reduce dimension by exploiting knowledge on the geometry of f (e.g., identify unsignificant parameters or parameters that need to be set to “obvious” values).
  • EGO:

Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455-492. EGO targets the minimization of the expected deviation from the extremum of the studied function.

  • EI, Kriging Believer (no guarantee) and GaussianUCB differ by the

3.6.2 [2020-10-18 Sun]

3.6.2.1 Brice’s Links for Sampling on a Constrained Triangle

3.6.3 [2020-10-21 Wed]

3.6.3.1 Arnaud’s References

3.6.3.2 Surrogates Book

3.6.4 [2020-10-22 Thu]

3.6.4.1 Student’s Justification

The experimental validation of the approaches studied during this thesis, namely Design of Experiments and Gaussian Process Regression, took longer than expected in the initial SPAPT benchmark. The large set of problems and number of configurable parameters slowed down a comprehensive and careful analysis of the relationships between parameters and performance for many problems.

The Gaussian Process Regression approach introduced further questions and restrictions for experimental validation, and the method was also applied to a different domain, the selection of bit precision for layers of Deep Neural Networks, where completing a single experimental run takes a full week.

I have already started my efforts towards writing the thesis, but our current assessment of the state of the work indicates that I most likely will not be able to finish before the end of the third year’s limit date. We intend to to finish writing the thesis by the end of December 2020, and consequently schedule the thesis defense to March 2021.

3.6.4.2 Advisor’s Justification

The contribution of this thesis is an evaluation of the effectiveness of established search, learning, and statistical methods for optimization, that were nonetheless not comprehensively applied and evaluated on the specific domain of optimization and performance modeling of computer programs.

We initially applied a Design of Experiments approach to model and optimize the performance of the code produced by frameworks for source-to-source transformation. One of the target frameworks was SPAPT, consisting of several programs and configurable parameters. We published our initial exploration and optimization of these programs in 2019, but the more extensive subsequent study, involving the explicit identification of significant parameters for each kernel, and the exploration of different modeling assumptions, such as quantile regression, took more time than expected.

In parallel to our Design of Experiments study, we also applied Gaussian Process Regression, a different category of optimization methods to a program optimization problem on a different application domain, namely, the adaptive selection of bit precision for each layer of Deep Neural Networks. This experiment involved the comparison of our optimization method with a reference Reinforce Learning algorithm. Experiments involve training and inference steps, which are necessary for model validation, and on the experimental scenarios such as the ones studied in this work, experiments take up to a week to complete. Our exploration of the particular conditions that make Gaussian Process Regression methods perform well in this sort of problem domain also took longer than expected. We studied sampling strategies that enabled local exploration of promising points, and compared the adequacy of different acquisition functions to determine such points. We have also revisited previous work on our source-to-source transformation benchmark on GPUs, applying and studying this optimization method in its context as well.

Work has started on thesis writing, and is planned to conclude by December 2020. The defense will then take place on March 2021.

3.7 November

3.7.1 [2020-11-11 Wed]

3.7.1.1 Meeting with Alfredo

  • Vinicius’ payment
  • Nathan’s payment
  • Cotutelle extension agreement @ USP
  • Attestation de financement jusqu’à Mars pour mon inscription à l’UGA
  • Papers: Luciano, Carlos (iWAPT), Giuliano (?)
  • Thesis
  • HPE

4 2021

4.1 January

4.1.1 [2021-01-22 Fri]

4.1.1.1 Screening for CUDA Parameters

4.1.1.1.1 Cloning and Updating the Repository
4.1.1.1.2 Quadro M1200
4.1.1.1.2.1 Loading Data
'data.frame':	38 obs. of  19 variables:
 $ _use_fast_math                 : chr  "on" "on" "on" "on" ...
 $ _opt_level_                    : int  2 2 2 2 2 2 2 2 2 3 ...
 $ _gpu_architecture_             : chr  "sm_50" "sm_50" "sm_50" "sm_50" ...
 $ _ftz_                          : chr  "false" "false" "false" "false" ...
 $ _force_store_cache_            : chr  "cg" "cg" "cg" "cg" ...
 $ _force_load_cache_             : chr  "cs" "cs" "cs" "cs" ...
 $ _allow_expensive_optimizations_: chr  "false" "false" "false" "false" ...
 $ _fmad_                         : chr  "true" "true" "true" "true" ...
 $ _optimize_                     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ _preserve_relocs               : chr  "on" "on" "on" "on" ...
 $ _relocatable_device_code_      : chr  "false" "false" "false" "false" ...
 $ _prec_div_                     : chr  "true" "true" "true" "true" ...
 $ _maxrregcount_                 : int  32 32 32 32 32 32 32 32 32 63 ...
 $ _prec_sqrt_                    : chr  "false" "false" "false" "false" ...
 $ _no_align_double               : chr  "on" "on" "on" "on" ...
 $ response                       : num  3.32 3.32 3.32 3.32 3.33 ...
 $ experiment_id                  : chr  "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" ...
 $ experiment_type                : chr  "screening_prediction" "screening_prediction" "screening_prediction" "screening_prediction" ...
 $ GPU                            : chr  "Quadro M1200" "Quadro M1200" "Quadro M1200" "Quadro M1200" ...
'data.frame':	70 obs. of  6 variables:
 $ _opt_level_       : int  2 2 2 2 2 2 2 2 2 2 ...
 $ _gpu_architecture_: chr  "sm_50" "sm_50" "sm_50" "sm_50" ...
 $ response          : num  3.34 3.34 3.34 3.34 3.34 ...
 $ experiment_id     : chr  "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" ...
 $ experiment_type   : chr  "baseline" "baseline" "baseline" "baseline" ...
 $ GPU               : chr  "Quadro M1200" "Quadro M1200" "Quadro M1200" "Quadro M1200" ...
'data.frame':	793 obs. of  23 variables:
 $ dummy3                         : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 1 ...
 $ _use_fast_math                 : chr  "off" "off" "off" "off" ...
 $ _opt_level_                    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ _gpu_architecture_             : chr  "sm_50" "sm_50" "sm_50" "sm_50" ...
 $ _ftz_                          : chr  "true" "true" "true" "true" ...
 $ _force_store_cache_            : chr  "cg" "cg" "cg" "cg" ...
 $ _force_load_cache_             : chr  "cs" "cs" "cs" "cs" ...
 $ _allow_expensive_optimizations_: chr  "false" "false" "false" "false" ...
 $ _fmad_                         : chr  "true" "true" "true" "true" ...
 $ _optimize_                     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ dummy2                         : int  1 1 1 1 1 1 1 1 1 -1 ...
 $ _preserve_relocs               : chr  "off" "off" "off" "off" ...
 $ _relocatable_device_code_      : chr  "true" "true" "true" "true" ...
 $ _prec_div_                     : chr  "false" "false" "false" "false" ...
 $ dummy1                         : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 1 ...
 $ _maxrregcount_                 : int  63 63 63 63 63 63 63 63 63 32 ...
 $ dummy4                         : int  1 1 1 1 1 1 1 1 1 -1 ...
 $ _prec_sqrt_                    : chr  "true" "true" "true" "true" ...
 $ _no_align_double               : chr  "off" "off" "off" "off" ...
 $ response                       : num  3.32 3.33 3.33 3.33 3.32 ...
 $ experiment_id                  : chr  "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" ...
 $ experiment_type                : chr  "screening" "screening" "screening" "screening" ...
 $ GPU                            : chr  "Quadro M1200" "Quadro M1200" "Quadro M1200" "Quadro M1200" ...
4.1.1.1.2.2 Computing Main Effects from Screening Data
4.1.1.1.2.2.1 Setup4.1.1.1.2.2.2 Plot
library(dplyr)
library(tidyr)
library(tibble)
library(broom)
library(ggplot2)
library(latex2exp)

bind_confint_lm <- function(model, experiment) {
    bind_cols(tidy(model) %>%
              rename(mean = estimate) %>%
              arrange(term),
              data.frame(confint(model)) %>%
              rownames_to_column() %>%
              rename(ci_inf = `X2.5..`,
                     ci_sup = `X97.5..`) %>%
              arrange(rowname)) %>%
        select(-rowname) %>%
        mutate(experiment = experiment)
}

plot_df = bind_rows(bind_confint_lm(lm_gau, "GAU"),
                    bind_confint_lm(lm_hwl, "HWL"),
                    bind_confint_lm(lm_ndl, "NDL")) %>%
    mutate(term = trimws(term, whitespace = "[_, `]")) %>%
    filter(!grepl("dummy", term)) %>%
    filter(!grepl("Intercept", term)) %>%
    mutate(term = gsub("_`", " = ", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("`", " = ", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("allow_expensive_optimizations", "exp_opt", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("gpu_architecture", "arch", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("no_align_double", "no_align_db", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("relocatable_device_code", "rel_dev_code", term,
                       fixed = FALSE))

p_effects_quadro = ggplot() +
    geom_point(data = plot_df,
               aes(x = term,
                   y = mean),
               size = 3) +
    geom_errorbar(data = plot_df,
                  aes(x = term,
                      ymin = ci_inf,
                      ymax = ci_sup),
                  width = 0.4) +
    geom_hline(data = plot_df,
               aes(x = term),
               linetype = 2,
               yintercept = 0.0) +
    ylab(TeX("Mean Effect Estimate $\\pm$ 95% CI")) +
    ggtitle("Quadro M1200") +
    facet_grid(experiment ~ .,
               scales = "free_y") +
    theme_bw(base_size = 24) +
    theme(plot.title = element_text(size = 19),
          strip.background = element_rect(fill = "white"),
          axis.title.x = element_blank(),
          axis.text.x = element_text(size = 18,
                                     angle = 30,
                                     hjust = 1,
                                     vjust = 1))

p_effects_quadro

'data.frame':	30 obs. of  19 variables:
 $ _use_fast_math                 : chr  "on" "on" "on" "on" ...
 $ _opt_level_                    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ _gpu_architecture_             : chr  "sm_50" "sm_50" "sm_50" "sm_50" ...
 $ _ftz_                          : chr  "true" "true" "true" "true" ...
 $ _force_store_cache_            : chr  "cg" "cg" "cg" "cg" ...
 $ _force_load_cache_             : chr  "cg" "cg" "cg" "cg" ...
 $ _allow_expensive_optimizations_: chr  "false" "false" "false" "false" ...
 $ _fmad_                         : chr  "false" "false" "false" "false" ...
 $ _optimize_                     : int  3 3 3 3 3 3 3 3 3 3 ...
 $ _preserve_relocs               : chr  "off" "off" "off" "off" ...
 $ _relocatable_device_code_      : chr  "false" "false" "false" "false" ...
 $ _prec_div_                     : chr  "false" "false" "false" "false" ...
 $ _maxrregcount_                 : int  32 32 32 32 32 32 32 32 32 32 ...
 $ _prec_sqrt_                    : chr  "false" "false" "false" "false" ...
 $ _no_align_double               : chr  "on" "on" "on" "on" ...
 $ response                       : num  1.38 1.28 1.28 1.3 1.29 ...
 $ experiment_id                  : chr  "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" "Model Prediction (GAU)" ...
 $ experiment_type                : chr  "screening_prediction" "screening_prediction" "screening_prediction" "screening_prediction" ...
 $ GPU                            : chr  "Titan X" "Titan X" "Titan X" "Titan X" ...
'data.frame':	60 obs. of  6 variables:
 $ _opt_level_       : int  2 2 2 2 2 2 2 2 2 2 ...
 $ _gpu_architecture_: chr  "sm_50" "sm_50" "sm_50" "sm_50" ...
 $ response          : num  1.33 1.33 1.32 1.31 1.33 ...
 $ experiment_id     : chr  "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" "Baseline (GAU)" ...
 $ experiment_type   : chr  "baseline" "baseline" "baseline" "baseline" ...
 $ GPU               : chr  "Titan X" "Titan X" "Titan X" "Titan X" ...
'data.frame':	600 obs. of  23 variables:
 $ dummy3                         : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ _use_fast_math                 : chr  "off" "off" "off" "off" ...
 $ _opt_level_                    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ _gpu_architecture_             : chr  "sm_50" "sm_50" "sm_50" "sm_50" ...
 $ _ftz_                          : chr  "true" "true" "true" "true" ...
 $ _force_store_cache_            : chr  "cg" "cg" "cg" "cg" ...
 $ _force_load_cache_             : chr  "cs" "cs" "cs" "cs" ...
 $ _allow_expensive_optimizations_: chr  "false" "false" "false" "false" ...
 $ _fmad_                         : chr  "true" "true" "true" "true" ...
 $ _optimize_                     : int  2 2 2 2 2 2 2 2 2 2 ...
 $ dummy2                         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ _preserve_relocs               : chr  "off" "off" "off" "off" ...
 $ _relocatable_device_code_      : chr  "true" "true" "true" "true" ...
 $ _prec_div_                     : chr  "false" "false" "false" "false" ...
 $ dummy1                         : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ _maxrregcount_                 : int  63 63 63 63 63 63 63 63 63 63 ...
 $ dummy4                         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ _prec_sqrt_                    : chr  "true" "true" "true" "true" ...
 $ _no_align_double               : chr  "off" "off" "off" "off" ...
 $ response                       : num  1.35 1.3 1.3 1.33 1.3 ...
 $ experiment_id                  : chr  "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" "Screening (GAU)" ...
 $ experiment_type                : chr  "screening" "screening" "screening" "screening" ...
 $ GPU                            : chr  "Titan X" "Titan X" "Titan X" "Titan X" ...
4.1.1.1.2.3 Computing Main Effects from Screening Data
4.1.1.1.2.3.1 Setup4.1.1.1.2.3.2 Plot
library(dplyr)
library(tidyr)
library(tibble)
library(broom)
library(ggplot2)
library(latex2exp)

bind_confint_lm <- function(model, experiment) {
    bind_cols(tidy(model) %>%
              rename(mean = estimate) %>%
              arrange(term),
              data.frame(confint(model)) %>%
              rownames_to_column() %>%
              rename(ci_inf = `X2.5..`,
                     ci_sup = `X97.5..`) %>%
              arrange(rowname)) %>%
        select(-rowname) %>%
        mutate(experiment = experiment)
}

plot_df = bind_rows(bind_confint_lm(lm_gau, "GAU"),
                    bind_confint_lm(lm_hwl, "HWL"),
                    bind_confint_lm(lm_ndl, "NDL")) %>%
    mutate(term = trimws(term, whitespace = "[_, `]")) %>%
    filter(!grepl("dummy", term)) %>%
    filter(!grepl("Intercept", term)) %>%
    mutate(term = gsub("_`", " = ", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("`", " = ", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("allow_expensive_optimizations", "exp_opt", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("gpu_architecture", "arch", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("no_align_double", "no_align_db", term,
                       fixed = FALSE)) %>%
    mutate(term = gsub("relocatable_device_code", "rel_dev_code", term,
                       fixed = FALSE))

p_effects_titan = ggplot() +
    geom_point(data = plot_df,
               aes(x = term,
                   y = mean),
               size = 3) +
    geom_errorbar(data = plot_df,
                  aes(x = term,
                      ymin = ci_inf,
                      ymax = ci_sup),
                  width = 0.4) +
    geom_hline(data = plot_df,
               aes(x = term),
               linetype = 2,
               yintercept = 0.0) +
    ylab(TeX("Mean Effect Estimate $\\pm$ 95% CI")) +
    ggtitle("Titan X") +
    facet_grid(experiment ~ .,
               scales = "free_y") +
    theme_bw(base_size = 24) +
    theme(plot.title = element_text(size = 19),
          strip.background = element_rect(fill = "white"),
          axis.title.x = element_blank(),
          axis.text.x = element_text(size = 18,
                                     angle = 30,
                                     hjust = 1,
                                     vjust = 1))

p_effects_titan

tibble [30 × 5] (S3: tbl_df/tbl/data.frame)
 $ response       : num [1:30] 0.12 0.105 0.104 0.105 0.104 ...
 $ experiment_type: chr [1:30] "OT" "OT" "OT" "OT" ...
 $ experiment_id  : chr [1:30] "NDL" "NDL" "NDL" "NDL" ...
 $ mean_r         : num [1:30] 0.106 0.106 0.106 0.106 0.106 ...
 $ ci95           : num [1:30] 0.00307 0.00307 0.00307 0.00307 0.00307 ...
4.1.1.1.2.4 Titan X
tibble [30 × 5] (S3: tbl_df/tbl/data.frame)
 $ response       : num [1:30] 0.215 0.193 0.19 0.208 0.154 ...
 $ experiment_type: chr [1:30] "OT" "OT" "OT" "OT" ...
 $ experiment_id  : chr [1:30] "NDL" "NDL" "NDL" "NDL" ...
 $ mean_r         : num [1:30] 0.186 0.186 0.186 0.186 0.186 ...
 $ ci95           : num [1:30] 0.0108 0.0108 0.0108 0.0108 0.0108 ...
4.1.1.1.3 Computing Speedups DataFrame
4.1.1.1.3.1 Merge Data
4.1.1.1.3.2 Plot
library(ggplot2)

df = read.csv("autotuning_screening_experiment/data/all_speedups.csv") %>%
    mutate(experiment_type = factor(experiment_type,
                                    levels = c("Screening",
                                               "OpenTuner")))

ggplot() +
    geom_point(data = df,
               size = 7,
               alpha = 1,
               aes(y = experiment_id,
                   x = speedup,
                   color = GPU),
               position = position_dodge(width = 0.8),
               show.legend = TRUE) +
    geom_vline(data = df,
               aes(y = experiment_id),
               linetype = 2,
               xintercept = 1) +
    geom_linerange(data = df,
                   aes(y = experiment_id,
                       xmin = speedup,
                       xmax = 1.0,
                       color = GPU),
                   linetype = 2,
                   position = position_dodge(width = 0.8),
                   show.legend = FALSE) +
    facet_grid(. ~ experiment_type) +
    xlab("Speedup vs. -O3") +
    scale_color_brewer(palette = "Dark2") +
    theme_bw(base_size = 37) +
    theme(axis.title.y = element_blank(),
          legend.position = c(0.85, 0.9),
          legend.background = element_blank(),
          legend.title = element_blank(),
          legend.text = element_text(size = 33),
          strip.text = element_text(size = 33),
          strip.background = element_rect(fill = "transparent"))
4.1.1.1.4 Loading Longer OpenTuner Data
4.1.1.1.4.1 Quadro M1200
tibble [30 × 5] (S3: tbl_df/tbl/data.frame)
 $ response       : num [1:30] 0.104 0.104 0.104 0.105 0.105 ...
 $ experiment_type: chr [1:30] "OT:200" "OT:200" "OT:200" "OT:200" ...
 $ experiment_id  : chr [1:30] "NDL" "NDL" "NDL" "NDL" ...
 $ mean_r         : num [1:30] 0.104 0.104 0.104 0.104 0.104 ...
 $ ci95           : num [1:30] 0.000213 0.000213 0.000213 0.000213 0.000213 ...
4.1.1.1.4.2 Titan X
tibble [30 × 5] (S3: tbl_df/tbl/data.frame)
 $ response       : num [1:30] 0.201 0.18 0.174 0.18 0.164 ...
 $ experiment_type: chr [1:30] "OT:200" "OT:200" "OT:200" "OT:200" ...
 $ experiment_id  : chr [1:30] "NDL" "NDL" "NDL" "NDL" ...
 $ mean_r         : num [1:30] 0.177 0.177 0.177 0.177 0.177 ...
 $ ci95           : num [1:30] 0.00714 0.00714 0.00714 0.00714 0.00714 ...
4.1.1.1.5 Composing Figures
4.1.1.1.5.1 Main Effects
library(ggplot2)
library(patchwork)

p_effects_titan / p_effects_quadro
4.1.1.1.5.2 Timing
library(ggplot2)
library(patchwork)

p_timing_ot_quadro = p_timing_quadro +
    geom_jitter(data = ot_df,
                size = 4,
                shape = 1,
                alpha = 1,
                aes(x = experiment_type,
                    y = response,
                    color = experiment_type),
                width = 0.3,
                height = 0.0,
                show.legend = FALSE) +
    geom_errorbar(data = ot_df,
                  aes(x = experiment_type,
                      ymin = mean_r - ci95,
                      ymax = mean_r + ci95,
                      color = experiment_type),
                  show.legend = FALSE,
                  width = 0.3) +
    geom_point(data = ot_df,
               size = 4,
               aes(x = experiment_type,
                   y = mean_r,
                   color = experiment_type),
               show.legend = FALSE) +
    geom_jitter(data = ot_longer_df,
                size = 4,
                shape = 1,
                alpha = 1,
                aes(x = experiment_type,
                    y = response,
                    color = experiment_type),
                width = 0.3,
                height = 0.0,
                show.legend = FALSE) +
    geom_errorbar(data = ot_longer_df,
                  aes(x = experiment_type,
                      ymin = mean_r - ci95,
                      ymax = mean_r + ci95,
                      color = experiment_type),
                  show.legend = FALSE,
                  width = 0.3) +
    geom_point(data = ot_longer_df,
               size = 4,
               aes(x = experiment_type,
                   y = mean_r,
                   color = experiment_type),
               show.legend = FALSE)

p_timing_ot_titan = p_timing_titan +
    geom_jitter(data = titan_ot_df,
                size = 4,
                shape = 1,
                alpha = 1,
                aes(x = experiment_type,
                    y = response,
                    color = experiment_type),
                width = 0.3,
                height = 0.0,
                show.legend = FALSE) +
    geom_errorbar(data = titan_ot_df,
                  aes(x = experiment_type,
                      ymin = mean_r - ci95,
                      ymax = mean_r + ci95,
                      color = experiment_type),
                  show.legend = FALSE,
                  width = 0.3) +
    geom_point(data = titan_ot_df,
               size = 4,
               aes(x = experiment_type,
                   y = mean_r,
                   color = experiment_type),
               show.legend = FALSE) +
    geom_jitter(data = titan_ot_longer_df,
                size = 4,
                shape = 1,
                alpha = 1,
                aes(x = experiment_type,
                    y = response,
                    color = experiment_type),
                width = 0.3,
                height = 0.0,
                show.legend = FALSE) +
    geom_errorbar(data = titan_ot_longer_df,
                  aes(x = experiment_type,
                      ymin = mean_r - ci95,
                      ymax = mean_r + ci95,
                      color = experiment_type),
                  show.legend = FALSE,
                  width = 0.3) +
    geom_point(data = titan_ot_longer_df,
               size = 4,
               aes(x = experiment_type,
                   y = mean_r,
                   color = experiment_type),
               show.legend = FALSE)

p_timing_ot_titan | p_timing_ot_quadro
4.1.1.1.5.3 Speedup
library(ggplot2)
library(patchwork)

p_speedup_titan | p_speedup_quadro

4.1.2 [2021-01-27 Wed]

4.1.2.1 Math Calendar 25 January

library(ggplot2)
library(primes)

prime_list = generate_n_primes(10)
combinations = expand.grid(p = prime_list, q = prime_list)
combinations$is_prime = is_prime((as.double(combinations$p) ^
                                  as.double(combinations$q)) + 1.0)

ggplot() +
    geom_tile(data = combinations,
              aes(x = as.factor(p),
                  y = as.factor(q),
                  fill = is_prime),
              show.legend = FALSE) +
    scale_fill_brewer(palette = "Greys") +
    theme_void()
4.1.2.1.1 Parsing and Saving Data
4.1.2.1.2 Loading Data
4.1.2.1.2.1 Baseline and Tuned
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Warning message:
replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’
4.1.2.1.2.2 Tuning Progress
4.1.2.1.3 Speedup Plots
4.1.2.1.3.1 Rodinia
library(dplyr)
library(ggplot2)
library(latex2exp)
library(stringr)

plot_df = speedups_df %>%
    filter(id == "Rodinia",
           gpu != "Tesla-K40-nvcc65",
           kernel != "ParticleFilterFloat",
           kernel != "ParticleFilterNaive") %>%
    mutate(kernel = gsub("Pathfinder", "pathfinder", kernel, fixed = TRUE),
           kernel = factor(kernel,
                           levels = c("b+tree", "backprop", "bfs", "gaussian",
                                      "heartwall", "hotspot", "kmeans",
                                      "lavaMD", "lud", "myocyte", "needle",
                                      "pathfinder"),
                           labels = c("BPT", "BCK", "BFS", "GAU", "HWL",
                                      "HOT", "KMN", "LMD", "LUD", "MYO",
                                      "NDL", "PTF")))

ggplot() +
    geom_point(data = plot_df,
               aes(y = kernel,
                   x = speedup,
                   color = gpu),
               size = 5,
               position = position_dodge(width = 0.8)) +
    geom_linerange(data = plot_df,
                   aes(y = kernel,
                       xmin = speedup,
                       xmax = 1.0,
                       color = gpu),
                   linetype = 2,
                   position = position_dodge(width = 0.8)) +
    scale_color_brewer(palette = "Set1") +
    geom_vline(xintercept = 1,
               linetype = 2,
               size = 0.8) +
    xlab(TeX("Speedup against \\textit{-O2}")) +
    theme_bw(base_size = 30) +
    theme(legend.position = c(0.9, 0.1),
          legend.title = element_blank(),
          legend.background = element_blank(),
          axis.title.y = element_blank(),
          axis.text.y = element_text(face = "italic"))

4.1.2.1.4 Specs
  • Processor: Xeon E5-2630V3
  • Single Core Turbo Clock: 3.2e9 Hz
  • DGEMV Input Size: M = 20000, N = 20000
4.1.2.1.5 Old Discussion
  • Run MKL with 20000 at Xeon 2630v3
  • Verify if matrix initialization is accounted for

Finalement, EGO explore bien mais ne gagne pas beaucoup par rapport au sampling initial. En même temps, il n’y a p-e pas grand chose à gagner.

Xeon xeon_e5_2630_v3. https://ranker.sisoftware.co.uk/top_device.php?q=c2ffc9f1d7bad7eadafc8eb383a5ccf1c1e78fb282a4dce1d1f792f7cafdcbfddba895a492&l=en annonce 26.14GFLOPS mais bon..

La bonne ref pour tout ça, c’est : https://ark.intel.com/content/www/us/en/ark/products/83356/intel-xeon-processor-e5-2630-v3-20m-cache-2-40-ghz.html

Pour “mm” (DGEMM):

Les matrices font 2,000 de côté

Une mult+une add donc: $2*2,000^3/307.2E9$ = 0,052 secondes… :( C’est pas possible, que ça soit un produit de matrices pour cette taille là.

bon, finalement,c ‘est un DGEMV donc on peut toujours calculer la borne flop $2*20,000^2/307.2E9$ = 2 milisecondes… mais c’est memory bound :) En mémoire: $20,000^2*8/ 59 E9 $ = 54 milisecondes. On est de l’ordre de la seconde (1.13s) donc un facteur 20 par rapport à l’optimal.

2 types de RAM. Si c’est un serveur avec de la RAM 1600, on perd 15% donc on est loin de la crète.

Il faudrait se comparer à la MKL pour bien faire mais il est possible que le kernel paramétré de SPAPT soit incapable d’atteindre cette crète (qui dans le cas du DGEMV devrait être proche d’un memcopy).

Orio utilise gcc en interne;

On pourrait réessayer avec DGEMM mais on peut avoir le même problème. Si leur kernel paramétré est mal foutu on sera loin de la crète…

Le set de benchmark est mal fichu (badly designed) au final. :(

4.1.2.1.6 Some Calculations
[1] 52.08333
[1] 8e+08
[1] 400020000
[1] 0.015625
[1] 0.05423729

Time taken: max(mem, cpu)

[1] 0.054
[1] 0.8677966
[1] 20.92593
4.1.2.1.7 Theoretical Roofline Model for the Xeon E5 2630 v3
library(ggplot2)
library(dplyr)
library(directlabels)
library(ggrepel)
library(latex2exp)

roofline <- function(peak_perf, peak_mem, arithm_intensity) {
    return(min(peak_mem * arithm_intensity, peak_perf))
}

perf_gemv <- function(size) {
    return(((size ^ 2) * 2) / 1e9)
}

mem_gemv <- function(size) {
    return((((size ^ 2) + size) * 8) / 1e9)
}

peak_perf_8core = 307.2

freq_8core = 2.4
freq_1core = 3.2
cores = 8

peak_perf_1core = ((peak_perf_8core / freq_8core) *
                   freq_1core) / cores

peak_mem = 59

arithm_intensity = seq(0, 2, length.out = 1000)

roofline_df = data.frame(arithmetic_intensity = arithm_intensity,
                         performance = sapply(arithm_intensity,
                                              function(x) {
                                                  roofline(peak_perf_1core, peak_mem, x) }))

roofline_df = roofline_df %>%
    mutate(memory_bound = factor(performance < peak_perf_1core,
                                 levels = c(TRUE, FALSE),
                                 labels = c("Memory Bound",
                                            "Compute Bound")))
gemv_size = 20000
gemv_perf = perf_gemv(gemv_size)
gemv_mem = mem_gemv(gemv_size)

gemv_peak_perf = max(gemv_perf / peak_perf_1core,
                     gemv_mem / peak_mem)

fastest_time = gemv_perf / 1.160198

gemv_df = data.frame(id = "GEMV",
                     peak_theoretical = gemv_peak_perf,
                     arithmetic_intensity = gemv_perf / gemv_mem,
                     peak_achieved = fastest_time)

ggplot() +
    geom_line(data = roofline_df,
              aes(x = arithmetic_intensity,
                  y = performance,
                  color = memory_bound),
              size = 2.4) +
    geom_dl(data = roofline_df %>%
                filter(performance < peak_perf_1core),
            aes(x = arithmetic_intensity,
                y = performance,
                color = memory_bound,
                label = paste(peak_mem, "GB/s")),
            method = list("smart.grid", rot = 53, cex = 2.5, vjust = 0)) +
    geom_dl(data = roofline_df %>%
                filter(performance >= peak_perf_1core),
            aes(x = arithmetic_intensity,
                y = performance,
                color = memory_bound,
                label = paste(peak_perf_1core, "GFLOPs/s")),
            method = list("smart.grid", rot = 0, cex = 2.5, vjust = 2.7)) +
    geom_point(data = gemv_df,
               aes(x = arithmetic_intensity,
                   y = arithmetic_intensity * peak_mem),
               size = 6) +
    geom_text_repel(data = gemv_df,
               aes(x = arithmetic_intensity,
                   y = arithmetic_intensity * peak_mem,
                   label = "Theoretical GEMV (N = 2e4)"),
               size = 11,
               hjust = -0.15) +
    geom_point(data = gemv_df,
               aes(x = arithmetic_intensity,
                   y = peak_achieved),
               size = 6) +
    geom_text_repel(data = gemv_df,
               aes(x = arithmetic_intensity,
                   y = peak_achieved,
                   label = "Best Achieved GEMV (20x slower)"),
               size = 11,
               hjust = -0.15) +
    ylab("Single Core GFLOPs/s") +
    xlab("FLOPs/byte") +
    scale_color_brewer(palette = "Dark2") +
    theme_bw(base_size = 28) +
    theme(legend.position = c(0.86, 0.1),
          legend.background = element_blank(),
          legend.title = element_blank(),
          legend.text = element_text(size = 30))

4.1.3 [2021-03-15 Mon]

4.1.3.1 Files for the Microsoft Latin America PhD Award Application

4.1.3.1.1 Application to the Microsoft Latin America PhD Award (didn’t pan out)
4.1.3.1.1.1 Search Heuristics and Statistical Learning methods for Autotuning HPC Programs

High Performance Computing has been a cornerstone of collective scientific and industrial progress for at least five decades. Paying the cost of increased complexity, software and hardware engineering advances continue to overcome several challenges on the way of the sustained performance improvements observed during the last fifty years. This mounting complexity means that reaching the advertised hardware performance for a given program requires not only expert knowledge of a given hardware architecture, but also mastery of programming models and languages for parallel and distributed computing.

If we state performance optimization problems as search or learning problems, by converting implementation and configuration choices to parameters which might affect performance, we can draw and adapt proven methods from search, mathematical optimization and statistics. The effectiveness of these adapted methods on autotuning problems varies greatly, and hinges on practical and mathematical properties of the problem and the corresponding search space.

When adapting methods for autotuning, we must face challenges emerging from practical properties such as restricted time and cost budgets, constraints on feasible parameter values, and the need to mix categorical, continuous, and discrete parameters. To achieve useful results, we must also choose methods that make hypotheses compatible with problem search spaces, such as the existence of discoverable, or at least exploitable, relationships between parameters and performance. Choosing an autotuning method requires determining a balance between the exploration of a problem, when we would seek to discover and explain relationships between parameters and performance, and the exploitation of the best optimizations we can find, when we would seek only to minimize performance.

The effectiveness of search heuristics on autotuning can be limited \cite{seymour2008comparison,balaprakash2011can,balaprakash2012experimental}, between other factors, by underlying hypotheses about the search space, such as the reachability of the global optimum and the smoothness of search space surfaces, which are frequently not respected. The derivation of relationships between parameters and performance from search heuristic optimizations is greatly hindered, if not rendered impossible, by the biased way these methods explore parameters. Some parametric learning methods, such as Design of Experiments, are not widely applied to autotuning. These methods perform structured parameter exploration, and can be used to build and validate performance models, generating transparent and cost-effective optimizations \cite{mametjanov2015autotuning,bruel2019autotuning}. Other methods from the parametric family are more widely used, such as Bandit Algorithms \cite{xu2017parallel}. Nonparametric learning methods, such as Decision Trees \cite{balaprakash2016automomml} and Gaussian Process Regression \cite{parsa2019pabo}, are able to reduce model bias greatly, at the expense of increased prediction variance. Figure \ref{fig:tree} categorizes some autotuning methods according to some of the key hypotheses and branching questions underlying each method.

During this thesis I have adapted and studied the effectiveness of different search heuristics and statistical learning methods on optimizing performance on several autotuning domains. During the beginning of my PhD at the University of São Paulo (USP), I have published a paper on optimizing the configuration of the CUDA compiler \cite{bruel2017autotuning}, where we have reached up to 4 times performance improvement in comparison with a high-level compiler optimization. In collaboration with researchers from Hewlett Packard Enterprise (HPE) in Palo Alto, I wrote a paper on the autotuning of a compiler for High-Level Synthesis for FPGAs \cite{bruel2017autotuninghls}, where we have reached, on average, 25% improvements on performance, size, and complexity of designs.

At the end of 2017, I joined the cotutelle PhD program at the University of Grenoble Alpes (UGA) and became a member of the POLARIS Inria team, where I applied Design of Experiments to the autotuning of a source-to-source transformation compiler \cite{bruel2019autotuning}, where we showed we can achieve significant speedup by exploiting search space structure using a strict budget. I also have collaborated with HPE on another paper, providing an analysis of the applicability of autotuning methods to a Hardware-Software Co-design problem \cite{bruel2017generalize}. During my Teaching Assistant internships, I have published one paper \cite{bruel2017openmp} on parallel programming teaching, and collaborated on another \cite{goncalves2016openmp}, where we showed that teaching lower level programming models, despite being more challenging at first, provides a stronger core understanding.

I continue to collaborate with HPE researchers on the application of autotuning methods to optimize Neural Networks, hardware accelerators for Deep Learning, and algorithms for dealing with network congestion. With my advisors, I currently manage 1 undergraduate and 4 masters students, who are applying the statistical learning autotuning methods I studied and adapted to different domains in the context of a joint USP/HPE research project. I am strongly motivated to continue pursuing a career on Computer Science research, aiming to produce rigorous and value-adding contributions. I hereby submit my thesis proposal and application to the Microsoft Latin America PhD Award.

4.1.3.1.1.2 (Short) Search Heuristics and Statistical Learning methods for Autotuning HPC Programs

High Performance Computing has been a cornerstone of collective scientific and industrial progress for at least five decades. Paying the cost of increased complexity, software and hardware engineering advances continue to overcome several challenges on the way of the sustained performance improvements observed during the last fifty years. This mounting complexity means that reaching the advertised hardware performance for a given program requires not only expert knowledge of a given hardware architecture, but also mastery of programming models and languages for parallel and distributed computing.

If we state performance optimization problems as search or learning problems, by converting implementation and configuration choices to parameters which might affect performance, we can draw and adapt proven methods from search, mathematical optimization and statistics. The effectiveness of these adapted methods on autotuning problems varies greatly, and hinges on practical and mathematical properties of the problem and the corresponding search space.

When adapting methods for autotuning, we must face challenges emerging from practical properties such as restricted time and cost budgets, constraints on feasible parameter values, and the need to mix categorical, continuous, and discrete parameters. To achieve useful results, we must also choose methods that make hypotheses compatible with problem search spaces, such as the existence of discoverable, or at least exploitable, relationships between parameters and performance. Choosing an autotuning method requires determining a balance between the exploration of a problem, when we would seek to discover and explain relationships between parameters and performance, and the exploitation of the best optimizations we can find, when we would seek only to minimize performance.

During this thesis I have adapted and studied the effectiveness of different search heuristics and statistical learning methods on optimizing performance on several autotuning domains. During the beginning of my PhD at the University of São Paulo (USP), I have published a paper on optimizing the configuration of the CUDA compiler \cite{bruel2017autotuning}, where we have reached up to 4 times performance improvement in comparison with a high-level compiler optimization. In collaboration with researchers from Hewlett Packard Enterprise (HPE) in Palo Alto, I wrote a paper on the autotuning of a compiler for High-Level Synthesis for FPGAs \cite{bruel2017autotuninghls}, where we have reached, on average, 25% improvements on performance, size, and complexity of designs.

At the end of 2017, I joined the cotutelle PhD program at the University of Grenoble Alpes (UGA) and became a member of the POLARIS Inria team, where I applied Design of Experiments to the autotuning of a source-to-source transformation compiler \cite{bruel2019autotuning}, where we showed we can achieve significant speedup by exploiting search space structure using a strict budget. I also have collaborated with HPE on another paper, providing an analysis of the applicability of autotuning methods to a Hardware-Software Co-design problem \cite{bruel2017generalize}.

I continue to collaborate with HPE researchers on the application of autotuning methods to optimize Neural Networks, hardware accelerators for Deep Learning, and algorithms for dealing with network congestion. With my advisors, I currently manage 1 undergraduate and 4 masters students, who are applying the statistical learning autotuning methods I studied and adapted to different domains in the context of a joint USP/HPE research project. I am strongly motivated to continue pursuing a career on Computer Science research, aiming to produce rigorous and value-adding contributions. I hereby submit my thesis proposal and application to the Microsoft Latin America PhD Award.

4.1.3.2 Arnaud and Brice: March 15th Meeting

4.1.3.2.1 Postdoc training sessions @ Inria
  • Registration OK
  • Project, lab, advisor?
  • Can it be Arnaud, or someone at LIG?
4.1.3.2.2 Empirical Roofline
  • Almost 2x peak when using march=native
  • Despite that, no improvement in the best configuration for dgemv kernel
    • Around 20% improvement for the -O3 kernel version
  • Good argument that the kernel is not capable of leveraging the parameters
4.1.3.2.3 Laplacian
  • Looking at a specific semi-automatic example
  • Factors were not always eliminated
  • Later, experiments were automated