MaxDiff Analysis of MSBA Class Preferences

Counts, maximum likelihood, and Bayesian estimation of the MNL model

R
MaxDiff
MNL
Bayesian
Marketing Analytics
A MaxDiff analysis of UCSD Rady MSBA class preferences using counts, MNL maximum likelihood, and a hand-coded Metropolis sampler.
Author

Austin Li

Published

May 28, 2026

Code
library(tidyverse)
library(gt)
library(scales)
library(forcats)

theme_set(
  theme_minimal(base_size = 12) +
    theme(
      plot.title = element_text(face = "bold"),
      plot.subtitle = element_text(color = "#5f6b75"),
      panel.grid.minor = element_blank(),
      legend.position = "bottom"
    )
)

course_palette <- c(
  "MGTA 403" = "#4E79A7",
  "MGTA 464" = "#F28E2B",
  "MGTA 451" = "#E15759",
  "MGTA 452" = "#76B7B2",
  "MGTA 453" = "#59A14F",
  "MGTA 444" = "#EDC948",
  "MGTA 455" = "#B07AA1",
  "MGTA 454" = "#FF9DA7",
  "MGTA 495" = "#9C755F",
  "MGTA 457" = "#BAB0AC"
)

log_sum_exp <- function(x) {
  m <- max(x)
  m + log(sum(exp(x - m)))
}

row_log_sum_exp <- function(mat) {
  m <- do.call(pmax, as.data.frame(mat))
  m + log(rowSums(exp(mat - m)))
}

Introduction

MaxDiff, also called Best-Worst Scaling, is a survey method for measuring relative preference. Instead of asking respondents to rate every item on a 1-10 scale, the survey repeatedly shows a smaller set of items and asks for the most-preferred and least-preferred option. In this study, each valid screen shows four UCSD Rady MSBA core classes.

The logic from the MaxDiff lecture is that people are usually better at making trade-offs than calibrating numeric scales. A rating of “8” may mean different things to different people, but choosing the best and worst item on a screen forces direct differentiation. That makes MaxDiff useful for marketing questions like feature priority, claim testing, course preference, or product-bundle design.

This analysis estimates the relative appeal of 10 MSBA courses in three ways. First, I compute a simple counts score: percent chosen best minus percent chosen worst. Second, I estimate an aggregate multinomial logit (MNL) model by maximum likelihood. Third, I fit the same MNL model with a weakly informative Bayesian prior and a hand-coded random-walk Metropolis sampler. The three methods should broadly agree, especially at the top and bottom of the ranking, but they communicate uncertainty in different ways.

The Data

Code
raw <- read_csv("../../../hw4/maxdiff_design_and_choices.csv", show_col_types = FALSE) |>
  rename(
    record_id = `Record ID`,
    task = Task,
    position = Position,
    item = Item,
    response = Response
  )

labels <- read_csv("../../../hw4/maxdiff_item_labels.csv", show_col_types = FALSE) |>
  select(item = `Item #`, label = Item) |>
  mutate(
    course = str_extract(label, "MGTA [0-9]+"),
    short_label = paste(course, str_extract(label, "(?<= )[A-Za-z.]+"))
  )

excluded_rows <- raw |>
  filter(task > 8 | item > 10)

maxdiff <- raw |>
  filter(task <= 8, item <= 10)

task_summary <- maxdiff |>
  group_by(record_id, task) |>
  summarise(
    items_per_task = n(),
    n_best = sum(response == 1),
    n_worst = sum(response == -1),
    n_neutral = sum(response == 0),
    .groups = "drop"
  ) |>
  mutate(valid_task = items_per_task == 4 & n_best == 1 & n_worst == 1 & n_neutral == 2)

data_summary <- tibble(
  statistic = c(
    "Respondents",
    "Tasks per respondent",
    "Items per valid task",
    "Items in study",
    "Rows used",
    "Rows excluded",
    "Invalid MaxDiff tasks after cleaning"
  ),
  value = c(
    n_distinct(maxdiff$record_id),
    paste(range(table(maxdiff$record_id) / 4), collapse = " to "),
    paste(range(task_summary$items_per_task), collapse = " to "),
    n_distinct(maxdiff$item),
    nrow(maxdiff),
    nrow(excluded_rows),
    sum(!task_summary$valid_task)
  )
)

The raw file contains some rows that do not match the MaxDiff structure described in the assignment. Specifically, tasks 9-15 include Item = 11 and only two rows per task, while the class-label file defines only 10 course items. I therefore focus on the valid MaxDiff portion: tasks 1-8 and items 1-10.

Code
data_summary |>
  gt() |>
  tab_header(title = "Data Structure After Cleaning")
Data Structure After Cleaning
statistic value
Respondents 85
Tasks per respondent 8 to 8
Items per valid task 4 to 4
Items in study 10
Rows used 2720
Rows excluded 1190
Invalid MaxDiff tasks after cleaning 0
Code
exposure <- maxdiff |>
  count(item, name = "times_shown") |>
  left_join(labels, by = "item") |>
  arrange(item)

exposure |>
  select(item, course, label, times_shown) |>
  gt() |>
  tab_header(title = "Exposure Balance by Course")
Exposure Balance by Course
item course label times_shown
1 MGTA 403 MGTA 403 AI Math, Prog., & Analytics (Summer with Nijs) 273
2 MGTA 464 MGTA 464 SQL (Summer with Nijs) 274
3 MGTA 451 MGTA 451 Marketing/Finance/Operations (Summer with Wilbur/Buti/Shin) 272
4 MGTA 452 MGTA 452 Large Data (Fall with Hansen) 276
5 MGTA 453 MGTA 453 Business Analytics (Fall with August) 270
6 MGTA 444 MGTA 444 Analytics Consulting (Winter with Peterson) 267
7 MGTA 455 MGTA 455 Customer Analytics (Winter with Nijs) 276
8 MGTA 454 MGTA 454 Capstone Project (Spring with Advisor) 272
9 MGTA 495 MGTA 495 Marketing Analytics (Spring with Yavorsky) 274
10 MGTA 457 MGTA 457 Biz. Intelligence Systems (Fall with Schibler) 266

The cleaned data pass the key sanity check: every valid MaxDiff task has exactly four shown courses, one best pick, one worst pick, and two unchosen courses. Exposure is also reasonably balanced across courses, which is the point of a MaxDiff experimental design.

Counts Analysis

The simplest MaxDiff summary uses the sufficient statistic emphasized in the MaxDiff slides:

\[ \text{score}_j = \%\text{best}_j - \%\text{worst}_j. \]

For each course, I divide the number of best and worst picks by the number of times the course was shown. A positive score means a course is more often selected as best than worst. A negative score means it is more often selected as worst.

Code
counts <- maxdiff |>
  group_by(item) |>
  summarise(
    times_shown = n(),
    times_best = sum(response == 1),
    times_worst = sum(response == -1),
    pct_best = times_best / times_shown,
    pct_worst = times_worst / times_shown,
    counts_score = pct_best - pct_worst,
    .groups = "drop"
  ) |>
  left_join(labels, by = "item")

set.seed(495)
B <- 1000
respondent_blocks <- split(maxdiff, maxdiff$record_id)
respondent_ids <- names(respondent_blocks)

bootstrap_scores <- map_dfr(seq_len(B), function(b) {
  sampled_ids <- sample(respondent_ids, length(respondent_ids), replace = TRUE)
  boot_df <- bind_rows(respondent_blocks[sampled_ids])

  boot_df |>
    group_by(item) |>
    summarise(
      score = sum(response == 1) / n() - sum(response == -1) / n(),
      .groups = "drop"
    ) |>
    mutate(rep = b)
})

count_intervals <- bootstrap_scores |>
  group_by(item) |>
  summarise(
    counts_low = quantile(score, 0.025),
    counts_high = quantile(score, 0.975),
    .groups = "drop"
  )

counts <- counts |>
  left_join(count_intervals, by = "item") |>
  mutate(counts_rank = min_rank(desc(counts_score))) |>
  arrange(counts_rank)
Code
counts |>
  select(
    rank = counts_rank,
    course,
    label,
    times_shown,
    times_best,
    times_worst,
    pct_best,
    pct_worst,
    counts_score,
    counts_low,
    counts_high
  ) |>
  gt() |>
  fmt_percent(columns = c(pct_best, pct_worst, counts_score, counts_low, counts_high), decimals = 1) |>
  tab_header(title = "Counts Analysis: Percent Best Minus Percent Worst")
Counts Analysis: Percent Best Minus Percent Worst
rank course label times_shown times_best times_worst pct_best pct_worst counts_score counts_low counts_high
1 MGTA 495 MGTA 495 Marketing Analytics (Spring with Yavorsky) 274 147 12 53.6% 4.4% 49.3% 40.6% 57.8%
2 MGTA 453 MGTA 453 Business Analytics (Fall with August) 270 93 55 34.4% 20.4% 14.1% 1.1% 27.1%
3 MGTA 455 MGTA 455 Customer Analytics (Winter with Nijs) 276 104 71 37.7% 25.7% 12.0% −2.9% 26.9%
4 MGTA 452 MGTA 452 Large Data (Fall with Hansen) 276 77 60 27.9% 21.7% 6.2% −5.8% 17.1%
5 MGTA 454 MGTA 454 Capstone Project (Spring with Advisor) 272 72 69 26.5% 25.4% 1.1% −11.8% 13.4%
6 MGTA 464 MGTA 464 SQL (Summer with Nijs) 274 55 56 20.1% 20.4% −0.4% −10.8% 9.6%
7 MGTA 457 MGTA 457 Biz. Intelligence Systems (Fall with Schibler) 266 23 55 8.6% 20.7% −12.0% −19.9% −3.8%
8 MGTA 403 MGTA 403 AI Math, Prog., & Analytics (Summer with Nijs) 273 33 90 12.1% 33.0% −20.9% −30.4% −10.5%
9 MGTA 451 MGTA 451 Marketing/Finance/Operations (Summer with Wilbur/Buti/Shin) 272 43 101 15.8% 37.1% −21.3% −32.6% −9.7%
10 MGTA 444 MGTA 444 Analytics Consulting (Winter with Peterson) 267 33 111 12.4% 41.6% −29.2% −40.0% −18.3%
Code
ggplot(
  counts,
  aes(x = counts_score, y = fct_reorder(course, counts_score), fill = course)
) +
  geom_col(width = 0.72, show.legend = FALSE) +
  geom_errorbar(aes(xmin = counts_low, xmax = counts_high), width = 0.18, orientation = "y") +
  geom_vline(xintercept = 0, color = "gray45") +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(values = course_palette) +
  labs(
    title = "Simple MaxDiff Counts Scores",
    subtitle = "Bootstrap intervals resample respondents, preserving each respondent's set of tasks",
    x = "% best minus % worst",
    y = NULL
  )

The counts analysis produces a clear top course: MGTA 495 Marketing Analytics. MGTA 453 Business Analytics and MGTA 455 Customer Analytics also perform well. The least preferred courses by this simple measure are MGTA 444 Analytics Consulting, MGTA 451 Marketing/Finance/Operations, and MGTA 403 AI Math, Programming, and Analytics. Counts are easy to explain, but they do not yet model the choice process explicitly.

From MaxDiff Data to MNL Choices

The conjoint and MNL slides translate a choice task into a softmax probability:

\[ P(\text{choice}=j) = \frac{\exp(V_j)}{\sum_k \exp(V_k)}. \]

For this MaxDiff assignment, item identity is the only “attribute,” so each course gets its own utility coefficient:

\[ V_j = \beta_j. \]

A MaxDiff screen gives two implicit MNL observations. First, the respondent picks the best course from the four shown courses. Second, after removing the best course, the respondent picks the worst course from the remaining three. The worst-choice probability uses flipped utilities:

\[ P(\text{worst}=j) = \frac{\exp(-\beta_j)}{\sum_k \exp(-\beta_k)}. \]

Only utility differences are identified in a softmax model: adding the same constant to every \(\beta_j\) leaves every choice probability unchanged. I therefore set \(\beta_1 = 0\) and estimate the remaining nine coefficients relative to MGTA 403.

Code
choice_tasks <- maxdiff |>
  arrange(record_id, task, position) |>
  group_by(record_id, task) |>
  summarise(
    shown = list(item),
    best = first(item[response == 1]),
    worst = first(item[response == -1]),
    .groups = "drop"
  )

shown_mat <- do.call(rbind, choice_tasks$shown)
best_vec <- choice_tasks$best
worst_vec <- choice_tasks$worst
n_tasks <- nrow(shown_mat)

remaining_mat <- t(vapply(seq_len(n_tasks), function(i) {
  shown_mat[i, shown_mat[i, ] != best_vec[i]]
}, numeric(3)))

MNL via Maximum Likelihood

The maximum likelihood estimate chooses the \(\beta\) values that make the observed best and worst choices as probable as possible. For one task, the contribution is:

\[ \log P(\text{best}) + \log P(\text{worst} \mid \text{best removed}). \]

Summing over all tasks gives the log-likelihood:

\[ \ell(\boldsymbol{\beta}) = \sum_{\text{tasks}} \left[ \beta_{j_b^*} - \log \sum_{j \in \text{shown}}\exp(\beta_j) - \beta_{j_w^*} - \log \sum_{j \in \text{shown}\setminus j_b^*}\exp(-\beta_j) \right]. \]

The implementation evaluates both denominator terms with a log-sum-exp helper rather than taking log(sum(exp(x))) directly. This is numerically more stable because it subtracts the largest utility before exponentiating, so very large or very small utilities are less likely to overflow or underflow during optimization and MCMC.

Code
maxdiff_ll <- function(beta_free) {
  beta <- c(0, beta_free)

  best_utilities <- matrix(beta[shown_mat], nrow = n_tasks, ncol = 4)
  worst_utilities <- matrix(-beta[remaining_mat], nrow = n_tasks, ncol = 3)

  sum(
    beta[best_vec] -
      row_log_sum_exp(best_utilities) +
      (-beta[worst_vec]) -
      row_log_sum_exp(worst_utilities)
  )
}

mle_fit <- optim(
  par = rep(0, 9),
  fn = maxdiff_ll,
  method = "BFGS",
  hessian = TRUE,
  control = list(fnscale = -1, maxit = 5000)
)

mle_beta <- c(0, mle_fit$par)
mle_se <- c(NA_real_, sqrt(diag(solve(-mle_fit$hessian))))
mle_share <- exp(mle_beta) / sum(exp(mle_beta))

mle_results <- tibble(
  item = 1:10,
  beta_mle = mle_beta,
  se_mle = mle_se,
  mle_share = mle_share
) |>
  left_join(labels, by = "item") |>
  mutate(mle_rank = min_rank(desc(mle_share))) |>
  arrange(mle_rank)
Code
mle_results |>
  select(rank = mle_rank, course, label, beta_mle, se_mle, mle_share) |>
  gt() |>
  fmt_number(columns = c(beta_mle, se_mle), decimals = 3) |>
  fmt_percent(columns = mle_share, decimals = 1) |>
  sub_missing(columns = se_mle, missing_text = "Reference") |>
  tab_header(title = "MNL Maximum Likelihood Estimates")
MNL Maximum Likelihood Estimates
rank course label beta_mle se_mle mle_share
1 MGTA 495 MGTA 495 Marketing Analytics (Spring with Yavorsky) 1.630 0.146 28.4%
2 MGTA 453 MGTA 453 Business Analytics (Fall with August) 0.744 0.142 11.7%
3 MGTA 455 MGTA 455 Customer Analytics (Winter with Nijs) 0.656 0.142 10.7%
4 MGTA 452 MGTA 452 Large Data (Fall with Hansen) 0.555 0.140 9.7%
5 MGTA 454 MGTA 454 Capstone Project (Spring with Advisor) 0.443 0.140 8.7%
6 MGTA 464 MGTA 464 SQL (Summer with Nijs) 0.438 0.137 8.6%
7 MGTA 457 MGTA 457 Biz. Intelligence Systems (Fall with Schibler) 0.236 0.137 7.0%
8 MGTA 403 MGTA 403 AI Math, Prog., & Analytics (Summer with Nijs) 0.000 Reference 5.6%
9 MGTA 451 MGTA 451 Marketing/Finance/Operations (Summer with Wilbur/Buti/Shin) −0.067 0.139 5.2%
10 MGTA 444 MGTA 444 Analytics Consulting (Winter with Peterson) −0.239 0.141 4.4%
Code
ggplot(
  mle_results,
  aes(x = mle_share, y = fct_reorder(course, mle_share), fill = course)
) +
  geom_col(width = 0.72, show.legend = FALSE) +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(values = course_palette) +
  labs(
    title = "MNL Shares of Preference from Maximum Likelihood",
    subtitle = "Shares answer: if all 10 classes were shown together, what share of choices would each receive?",
    x = "Share of preference",
    y = NULL
  )

The MLE ranking is very similar to the counts ranking, especially at the top. This is expected from the MNL slides: MaxDiff counts are a strong summary of aggregate best-worst information. The model, however, gives a probability-based interpretation and standard errors from the Hessian.

MNL via Bayesian Estimation

The Bayesian version uses the same likelihood, but combines it with a weakly informative prior:

\[ \boldsymbol{\beta} \sim N(\mathbf{0}, 10I). \]

The posterior is proportional to likelihood times prior:

\[ \pi(\boldsymbol{\beta}\mid \boldsymbol{y}) \propto L(\boldsymbol{\beta})\pi(\boldsymbol{\beta}). \]

Because the softmax likelihood has no conjugate prior, I use the random-walk Metropolis algorithm from the Bayesian slides. The proposal is:

\[ \boldsymbol{\beta}^* = \boldsymbol{\beta}_{t-1} + \epsilon, \qquad \epsilon \sim N(\mathbf{0}, s^2I). \]

Code
log_post <- function(beta_free) {
  maxdiff_ll(beta_free) + sum(dnorm(beta_free, 0, sqrt(10), log = TRUE))
}

mh_sampler <- function(beta0, n_iter, step) {
  K <- length(beta0)
  draws <- matrix(NA_real_, n_iter, K)
  beta <- beta0
  lp <- log_post(beta)
  accept <- 0L

  for (t in seq_len(n_iter)) {
    beta_star <- beta + rnorm(K, 0, step)
    lp_star <- log_post(beta_star)

    if (log(runif(1)) < lp_star - lp) {
      beta <- beta_star
      lp <- lp_star
      accept <- accept + 1L
    }

    draws[t, ] <- beta
  }

  list(draws = draws, accept_rate = accept / n_iter)
}

calc_rhat <- function(draw_mat) {
  n <- nrow(draw_mat)
  m <- ncol(draw_mat)
  chain_means <- colMeans(draw_mat)
  chain_vars <- apply(draw_mat, 2, var)
  W <- mean(chain_vars)
  B <- n * var(chain_means)
  var_hat <- ((n - 1) / n) * W + (B / n)
  sqrt(var_hat / W)
}

ess_one_chain <- function(x, max_lag = 1000) {
  x <- as.numeric(x)
  n <- length(x)

  if (n < 3 || sd(x) == 0) {
    return(NA_real_)
  }

  ac <- as.numeric(acf(x, lag.max = min(max_lag, n - 1), plot = FALSE)$acf)[-1]
  first_nonpositive <- which(ac <= 0)[1]

  if (!is.na(first_nonpositive)) {
    ac <- ac[seq_len(first_nonpositive - 1)]
  }

  tau <- 1 + 2 * sum(ac)
  min(n / tau, n)
}

calc_ess <- function(draw_mat) {
  ess_chains <- apply(draw_mat, 2, ess_one_chain)
  min(sum(ess_chains, na.rm = TRUE), length(draw_mat))
}

set.seed(495)
n_iter <- 20000
step_size <- 0.07
burn <- floor(0.25 * n_iter)
n_chains <- 4

initial_values <- list(
  chain_1 = rep(0, 9),
  chain_2 = rep(1.0, 9),
  chain_3 = rep(-1.0, 9),
  chain_4 = rnorm(9, 0, 1.5)
)

mh_fits <- map(
  initial_values,
  ~ mh_sampler(beta0 = .x, n_iter = n_iter, step = step_size)
)

free_names <- paste0("beta_", 2:10)

chain_array <- array(
  NA_real_,
  dim = c(n_iter, 9, n_chains),
  dimnames = list(iteration = NULL, parameter = free_names, chain = names(initial_values))
)

for (ch in seq_along(mh_fits)) {
  chain_array[, , ch] <- mh_fits[[ch]]$draws
}

post_free_array <- chain_array[(burn + 1):n_iter, , , drop = FALSE]
post_free <- do.call(rbind, lapply(seq_len(n_chains), function(ch) {
  post_free_array[, , ch, drop = FALSE][, , 1]
}))
colnames(post_free) <- free_names
post_beta <- cbind(beta_1 = 0, post_free)
colnames(post_beta) <- paste0("beta_", 1:10)

acceptance_tbl <- tibble(
  chain = names(mh_fits),
  start = c("all zeros", "all +1", "all -1", "random N(0, 1.5)"),
  accept_rate = map_dbl(mh_fits, "accept_rate")
)

accept_rate_mean <- mean(acceptance_tbl$accept_rate)
accept_rate_range <- range(acceptance_tbl$accept_rate)

rhat_values <- apply(post_free_array, 2, calc_rhat)
ess_values <- apply(post_free_array, 2, calc_ess)

mcmc_diagnostics <- tibble(
  parameter = free_names,
  rhat = as.numeric(rhat_values),
  ess = as.numeric(ess_values)
)

share_draws <- exp(post_beta)
share_draws <- sweep(share_draws, 1, rowSums(share_draws), "/")
colnames(share_draws) <- paste0("share_", 1:10)

bayes_results <- tibble(
  item = 1:10,
  beta_mean = colMeans(post_beta),
  beta_low = apply(post_beta, 2, quantile, 0.025),
  beta_high = apply(post_beta, 2, quantile, 0.975),
  bayes_share = colMeans(share_draws),
  share_low = apply(share_draws, 2, quantile, 0.025),
  share_high = apply(share_draws, 2, quantile, 0.975)
) |>
  left_join(labels, by = "item") |>
  mutate(bayes_rank = min_rank(desc(bayes_share))) |>
  arrange(bayes_rank)

The mean sampler acceptance rate is 35.2% across four chains, with chain-specific rates from 34.7% to 35.8%. This falls in the 25-40% range recommended in the Bayesian MCMC slides.

Starting all chains at the MLE would make the trace plots look tidy, but it would not be a strong diagnostic: the chain would already begin near the high-posterior region. Here the four chains start from deliberately dispersed values. If those chains move toward the same stationary region after burn-in, it is evidence that the result is not just an artifact of one convenient starting point. R-hat formalizes this idea by comparing between-chain variation to within-chain variation; values close to 1 indicate that chains agree. ESS asks a different question: after accounting for autocorrelation, how many independent draws is the MCMC output roughly worth?

Code
trace_plot_data <- map_dfr(seq_len(n_chains), function(ch) {
  as_tibble(chain_array[, c("beta_2", "beta_5", "beta_9"), ch]) |>
    mutate(
      iteration = row_number(),
      chain = names(initial_values)[ch]
    )
}) |>
  pivot_longer(
    cols = c(beta_2, beta_5, beta_9),
    names_to = "parameter",
    values_to = "draw"
  )

ggplot(trace_plot_data, aes(x = iteration, y = draw, color = chain)) +
  geom_line(linewidth = 0.22, alpha = 0.55) +
  geom_vline(xintercept = burn, linetype = "dashed", color = "#E15759") +
  facet_wrap(~ parameter, scales = "free_y", ncol = 1) +
  labs(
    title = "Metropolis Trace Plots from Dispersed Initial Values",
    subtitle = "Dashed line marks the burn-in cutoff. Multiple chains move toward the same posterior region.",
    x = "Iteration",
    y = "Beta draw",
    color = NULL
  )

Code
mcmc_diagnostics |>
  gt() |>
  fmt_number(columns = rhat, decimals = 3) |>
  fmt_number(columns = ess, decimals = 0) |>
  tab_header(title = "MCMC Diagnostics After Burn-in")
MCMC Diagnostics After Burn-in
parameter rhat ess
beta_2 1.001 507
beta_3 1.003 503
beta_4 1.002 507
beta_5 1.002 522
beta_6 1.003 465
beta_7 1.003 465
beta_8 1.003 489
beta_9 1.002 473
beta_10 1.003 540
Code
bayes_results |>
  select(
    rank = bayes_rank,
    course,
    label,
    beta_mean,
    beta_low,
    beta_high,
    bayes_share,
    share_low,
    share_high
  ) |>
  gt() |>
  fmt_number(columns = c(beta_mean, beta_low, beta_high), decimals = 3) |>
  fmt_percent(columns = c(bayes_share, share_low, share_high), decimals = 1) |>
  tab_header(title = "Bayesian MNL Posterior Summaries")
Bayesian MNL Posterior Summaries
rank course label beta_mean beta_low beta_high bayes_share share_low share_high
1 MGTA 495 MGTA 495 Marketing Analytics (Spring with Yavorsky) 1.629 1.355 1.910 28.4% 24.3% 33.0%
2 MGTA 453 MGTA 453 Business Analytics (Fall with August) 0.738 0.469 1.038 11.7% 9.6% 14.1%
3 MGTA 455 MGTA 455 Customer Analytics (Winter with Nijs) 0.654 0.383 0.934 10.7% 8.8% 13.0%
4 MGTA 452 MGTA 452 Large Data (Fall with Hansen) 0.552 0.284 0.831 9.7% 7.9% 11.6%
5 MGTA 464 MGTA 464 SQL (Summer with Nijs) 0.435 0.175 0.707 8.6% 7.1% 10.4%
6 MGTA 454 MGTA 454 Capstone Project (Spring with Advisor) 0.435 0.165 0.708 8.6% 7.0% 10.4%
7 MGTA 457 MGTA 457 Biz. Intelligence Systems (Fall with Schibler) 0.233 −0.030 0.509 7.0% 5.8% 8.5%
8 MGTA 403 MGTA 403 AI Math, Prog., & Analytics (Summer with Nijs) 0.000 0.000 0.000 5.6% 4.6% 6.7%
9 MGTA 451 MGTA 451 Marketing/Finance/Operations (Summer with Wilbur/Buti/Shin) −0.071 −0.348 0.208 5.2% 4.2% 6.4%
10 MGTA 444 MGTA 444 Analytics Consulting (Winter with Peterson) −0.243 −0.524 0.035 4.4% 3.5% 5.3%
Code
ggplot(
  bayes_results,
  aes(x = bayes_share, y = fct_reorder(course, bayes_share), fill = course)
) +
  geom_col(width = 0.72, show.legend = FALSE) +
  geom_errorbar(aes(xmin = share_low, xmax = share_high), width = 0.18, orientation = "y") +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(values = course_palette) +
  labs(
    title = "Bayesian MNL Shares of Preference",
    subtitle = "Intervals are posterior 95% credible intervals for each share",
    x = "Posterior share of preference",
    y = NULL
  )

Code
mle_bayes_check <- mle_results |>
  select(item, course, label, beta_mle, mle_share) |>
  left_join(
    bayes_results |>
      select(item, beta_mean, bayes_share),
    by = "item"
  )

beta_alignment_cor <- cor(mle_bayes_check$beta_mle, mle_bayes_check$beta_mean)
share_alignment_cor <- cor(mle_bayes_check$mle_share, mle_bayes_check$bayes_share)

mle_bayes_check |>
  arrange(desc(mle_share)) |>
  gt() |>
  fmt_number(columns = c(beta_mle, beta_mean), decimals = 3) |>
  fmt_percent(columns = c(mle_share, bayes_share), decimals = 1) |>
  tab_header(title = "MLE and Bayesian Estimates Are Consistent")
MLE and Bayesian Estimates Are Consistent
item course label beta_mle mle_share beta_mean bayes_share
9 MGTA 495 MGTA 495 Marketing Analytics (Spring with Yavorsky) 1.630 28.4% 1.629 28.4%
5 MGTA 453 MGTA 453 Business Analytics (Fall with August) 0.744 11.7% 0.738 11.7%
7 MGTA 455 MGTA 455 Customer Analytics (Winter with Nijs) 0.656 10.7% 0.654 10.7%
4 MGTA 452 MGTA 452 Large Data (Fall with Hansen) 0.555 9.7% 0.552 9.7%
8 MGTA 454 MGTA 454 Capstone Project (Spring with Advisor) 0.443 8.7% 0.435 8.6%
2 MGTA 464 MGTA 464 SQL (Summer with Nijs) 0.438 8.6% 0.435 8.6%
10 MGTA 457 MGTA 457 Biz. Intelligence Systems (Fall with Schibler) 0.236 7.0% 0.233 7.0%
1 MGTA 403 MGTA 403 AI Math, Prog., & Analytics (Summer with Nijs) 0.000 5.6% 0.000 5.6%
3 MGTA 451 MGTA 451 Marketing/Finance/Operations (Summer with Wilbur/Buti/Shin) −0.067 5.2% −0.071 5.2%
6 MGTA 444 MGTA 444 Analytics Consulting (Winter with Peterson) −0.239 4.4% −0.243 4.4%
Code
ggplot(
  mle_bayes_check,
  aes(x = beta_mle, y = beta_mean, color = course, label = course)
) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "#6b7280") +
  geom_point(size = 3) +
  geom_text(nudge_y = 0.08, size = 3, show.legend = FALSE, check_overlap = TRUE) +
  scale_color_manual(values = course_palette, drop = FALSE) +
  labs(
    title = "MLE and Bayesian Posterior Means Align",
    subtitle = "Dashed line marks exact equality between the two estimates.",
    x = "MLE beta",
    y = "Posterior mean beta",
    color = NULL
  )

The Bayesian posterior means are close to the MLE point estimates: the beta correlation is 1.000, and the preference-share correlation is 1.000. This makes sense because the prior is weak and the data are informative. The benefit of the Bayesian workflow is that uncertainty for derived quantities is direct: I compute shares for every posterior draw and then summarize those share draws with means and credible intervals.

Comparing the Three Methods

Code
comparison <- counts |>
  select(item, course, label, counts_score, counts_rank) |>
  left_join(mle_results |> select(item, beta_mle, se_mle, mle_share, mle_rank), by = "item") |>
  left_join(
    bayes_results |>
      select(item, beta_mean, beta_low, beta_high, bayes_share, share_low, share_high, bayes_rank),
    by = "item"
  ) |>
  arrange(mle_rank)

comparison |>
  select(
    course,
    label,
    counts_score,
    counts_rank,
    mle_share,
    mle_rank,
    bayes_share,
    bayes_rank
  ) |>
  gt() |>
  fmt_percent(columns = c(counts_score, mle_share, bayes_share), decimals = 1) |>
  tab_header(title = "Comparison of Counts, MLE MNL, and Bayesian MNL")
Comparison of Counts, MLE MNL, and Bayesian MNL
course label counts_score counts_rank mle_share mle_rank bayes_share bayes_rank
MGTA 495 MGTA 495 Marketing Analytics (Spring with Yavorsky) 49.3% 1 28.4% 1 28.4% 1
MGTA 453 MGTA 453 Business Analytics (Fall with August) 14.1% 2 11.7% 2 11.7% 2
MGTA 455 MGTA 455 Customer Analytics (Winter with Nijs) 12.0% 3 10.7% 3 10.7% 3
MGTA 452 MGTA 452 Large Data (Fall with Hansen) 6.2% 4 9.7% 4 9.7% 4
MGTA 454 MGTA 454 Capstone Project (Spring with Advisor) 1.1% 5 8.7% 5 8.6% 6
MGTA 464 MGTA 464 SQL (Summer with Nijs) −0.4% 6 8.6% 6 8.6% 5
MGTA 457 MGTA 457 Biz. Intelligence Systems (Fall with Schibler) −12.0% 7 7.0% 7 7.0% 7
MGTA 403 MGTA 403 AI Math, Prog., & Analytics (Summer with Nijs) −20.9% 8 5.6% 8 5.6% 8
MGTA 451 MGTA 451 Marketing/Finance/Operations (Summer with Wilbur/Buti/Shin) −21.3% 9 5.2% 9 5.2% 9
MGTA 444 MGTA 444 Analytics Consulting (Winter with Peterson) −29.2% 10 4.4% 10 4.4% 10
Code
comparison_long <- bind_rows(
  comparison |>
    transmute(method = "Counts score", course, value = counts_score),
  comparison |>
    transmute(method = "MLE share", course, value = mle_share),
  comparison |>
    transmute(method = "Bayes share", course, value = bayes_share)
) |>
  mutate(
    method = factor(method, levels = c("Counts score", "MLE share", "Bayes share")),
    course_panel = fct_reorder(paste(method, course, sep = "__"), value)
  )

ggplot(comparison_long, aes(x = value, y = course_panel, fill = course)) +
  geom_col(width = 0.72, show.legend = FALSE) +
  facet_wrap(~ method, scales = "free") +
  scale_y_discrete(labels = function(x) str_replace(x, "^.*__", "")) +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(values = course_palette, drop = FALSE) +
  labs(
    title = "Three Views of the Same Preference Signal",
    subtitle = "Counts are descriptive; MLE and Bayes model the same best-worst choice process.",
    x = NULL,
    y = NULL
  )

All three methods agree that MGTA 495 is the most preferred course in this sample. They also broadly agree on the least preferred courses. Differences are most visible in the middle of the ranking, where course utilities are closer together and small amounts of sampling noise can flip ranks.

The MNL model adds structure over counts because it explicitly models each choice set. A shown but unchosen item still contributes information: it lost that particular comparison. Bayesian MNL adds a convenient way to communicate uncertainty for transformed quantities, such as shares of preference, because every posterior draw can be pushed through the softmax.

Discussion

Substantively, MSBA students in this survey show the strongest preference for MGTA 495 Marketing Analytics, followed by MGTA 453 Business Analytics and MGTA 455 Customer Analytics. That ranking is reassuring for this assignment, since the analysis is being conducted in the marketing analytics course itself, but it may also reflect the sample’s interests and timing.

Methodologically, the three approaches answer related but slightly different questions. Counts are best for a quick dashboard-style summary. MLE MNL is better when we want a formal choice model, point estimates, standard errors, and a probability interpretation. Bayesian MNL is most useful when we want uncertainty on derived quantities or when we want to extend the model to hierarchical Bayes and individual-level preferences.

The main caveat is external validity. The respondents are a self-selected sample of MSBA students, not a random sample of every possible student. Preferences may also depend on prior coursework, the quarter when the class is taken, instructor expectations, and current career goals. With more time, I would extend this model hierarchically to estimate respondent-level part-worths and then ask whether preference segments exist within the MSBA cohort.

Back to top