🎨 Selection Bias & Missing Data Challenge - Part 1
📊 Challenge Requirements
Your Task: Reproduce the blue noise stippling process demonstrated below to create:
A stippled version of your chosen image
A progressive stippling GIF animation
Post both to a GitHub Pages site with appropriate captions and a brief explanation
Part 2 Preview: On November 18th, we’ll tackle Part 2 of this challenge, where you’ll create a statistical meme about selection bias and missing data using your stippled images.
The Problem: Can Algorithms Create Art?
Core Question: How can we convert a photograph into an aesthetically pleasing pattern of dots that preserves the visual information of the original image?
The Challenge: Blue noise stippling is a technique that converts images into patterns of dots (stipples) using algorithms that balance visual accuracy with spatial distribution. This challenge asks you to implement a modified “void and cluster” algorithm that combines importance sampling with blue noise distribution properties to create stippling patterns that are both visually accurate and spatially well-distributed.
Our Approach: We’ll use a modified void-and-cluster algorithm that: 1. Creates an importance map identifying visually important regions 2. Uses a toroidal (periodic) Gaussian kernel for repulsion to ensure blue noise properties 3. Iteratively selects points with minimum energy 4. Balances image content importance with blue noise spatial distribution
⚠️ AI Partnership Required
This challenge pushes boundaries intentionally. You’ll tackle problems that normally require weeks of study, but with Cursor AI as your partner (and your brain keeping it honest), you can accomplish more than you thought possible.
The new reality: The four stages of competence are Ignorance → Awareness → Learning → Mastery. AI lets us produce Mastery-level work while operating primarily in the Awareness stage. I focus on awareness training, you leverage AI for execution, and together we create outputs that used to require years of dedicated study.
Introduction to Blue Noise Stippling
Blue noise stippling is a technique for converting images into a pattern of dots (stipples) that preserves the visual information of the original image while creating an aesthetically pleasing, evenly distributed pattern. This method follows the approach described by Bart Wronski.
The method uses a modified “void and cluster” algorithm that combines importance sampling with blue noise distribution properties to create stippling patterns that are both visually accurate and spatially well-distributed. This version uses smooth extreme downweighting that selectively downweights very dark and very light tones while preserving mid-tones, creating a more balanced distribution of stipples across the image.
Loading the Original Image
First, let’s load an image that we’ll convert to a blue noise stippling pattern. You can use any image you’d like, but we’ll demonstrate with the provided example.
import numpy as npfrom PIL import Imageimport matplotlib.pyplot as plt# Load the imageimg_path ='fleischhacker.jpg'original_img = Image.open(img_path)# Convert to grayscale if neededif original_img.mode !='L': original_img = original_img.convert('L')# Convert to numpy array and normalize to [0, 1]img_array = np.array(original_img, dtype=np.float32) /255.0# Display the original imagefig, ax = plt.subplots(figsize=(6.5, 5))ax.imshow(img_array, cmap='gray', vmin=0, vmax=1)ax.axis('off')
Warning: package 'imager' was built under R version 4.5.2
Loading required package: magrittr
Attaching package: 'imager'
The following object is masked from 'package:magrittr':
add
The following objects are masked from 'package:stats':
convolve, spectrum
The following object is masked from 'package:graphics':
frame
The following object is masked from 'package:base':
save.image
library(ggplot2)# Load the imageimg_path <-'fleischhacker.jpg'original_img <-load.image(img_path)# Convert to grayscale if needed# Check channels (dim[3]), not depth (dim[4])if(dim(original_img)[3] ==3) { original_img <-grayscale(original_img)}# Display the original imageplot(original_img, axes =FALSE, main ="Original Image")
Before applying the stippling algorithm, we create an importance map that identifies which regions of the image should receive more stipples. The importance map is computed by:
Brightness inversion: The image brightness is inverted so that dark areas receive higher importance and thus more dots, while light areas receive fewer dots
Extreme tone downweighting: Smooth Gaussian functions downweight tones below 0.2 (very dark) and above 0.8 (very light), creating a gradual transition that preserves mid-tones
Mid-tone boost: A smooth Gaussian function centered on mid-tones provides a gradual increase in importance for mid-tone regions, ensuring they receive appropriate stippling density
Selective and effective: This approach ensures that stipples are distributed appropriately (more dots in dark areas and mid-tones, fewer in extreme dark/light areas) while maintaining good spatial distribution
🎯 Key Tuning Point: compute_importance Function
This function is all you need to control dot distribution! The compute_importance function is the primary mechanism for tuning where more or fewer dots appear in your stippled image. By adjusting its parameters, you can:
Control which tones get more dots: Adjust extreme_threshold_low and extreme_threshold_high to define what counts as “extreme” tones
Reduce dots in extreme regions: Increase extreme_downweight (0.0-1.0) to reduce stipples in very dark or very light areas
Boost specific tone ranges: Adjust mid_tone_boost and mid_tone_center to emphasize particular brightness ranges (e.g., skin tones around 0.7)
Control transition smoothness: Modify extreme_sigma and mid_tone_sigma to make transitions sharper or more gradual
No need to modify the stippling algorithm itself—all the tuning happens in this function. Experiment with different parameter values to achieve your desired dot distribution!
import numpy as npdef toroidal_gaussian_kernel(h: int, w: int, sigma: float):""" Create a periodic (toroidal) 2D Gaussian kernel centered at (0,0). The toroidal property means the kernel wraps around at the edges, ensuring consistent repulsion behavior regardless of point location. """ y = np.arange(h) x = np.arange(w)# Compute toroidal distances (minimum distance considering wrapping) dy = np.minimum(y, h - y)[:, None] dx = np.minimum(x, w - x)[None, :]# Compute Gaussian kern = np.exp(-(dx**2+ dy**2) / (2.0* sigma**2)) s = kern.sum()if s >0: kern /= s # Normalizereturn kerndef void_and_cluster( input_img: np.ndarray, percentage: float=0.08, sigma: float=0.9, content_bias: float=0.9, importance_img: np.ndarray |None=None, noise_scale_factor: float=0.1,):""" Generate blue noise stippling pattern from input image using a modified void-and-cluster algorithm with content-weighted importance. """ I = np.clip(input_img, 0.0, 1.0) h, w = I.shape# Compute or use provided importance mapif importance_img isNone: importance = compute_importance(I)else: importance = np.clip(importance_img, 0.0, 1.0)# Create toroidal Gaussian kernel for repulsion kernel = toroidal_gaussian_kernel(h, w, sigma)# Initialize energy field: lower energy → more likely to be picked energy_current =-importance * content_bias# Stipple buffer: start with white background; selected points become black dots final_stipple = np.ones_like(I) samples = []# Helper function to roll kernel to an arbitrary positiondef energy_splat(y, x):"""Get energy contribution by rolling the kernel to position (y, x)."""return np.roll(np.roll(kernel, shift=y, axis=0), shift=x, axis=1)# Number of points to select num_points =int(I.size * percentage)# Choose first point near center with minimal energy cy, cx = h //2, w //2 r =min(20, h //10, w //10) ys =slice(max(0, cy - r), min(h, cy + r)) xs =slice(max(0, cx - r), min(w, cx + r)) region = energy_current[ys, xs] flat = np.argmin(region) y0 = flat // (region.shape[1]) + (cy - r) x0 = flat % (region.shape[1]) + (cx - r)# Place first point energy_current = energy_current + energy_splat(y0, x0) energy_current[y0, x0] = np.inf # Prevent reselection samples.append((y0, x0, I[y0, x0])) final_stipple[y0, x0] =0.0# Black dot# Iteratively place remaining pointsfor i inrange(1, num_points):# Add exploration noise that decreases over time exploration =1.0- (i / num_points) *0.5# Decrease from 1.0 to 0.5 noise = np.random.normal(0.0, noise_scale_factor * content_bias * exploration, size=energy_current.shape) energy_with_noise = energy_current + noise# Find position with minimum energy (with noise for exploration) pos_flat = np.argmin(energy_with_noise) y = pos_flat // w x = pos_flat % w# Add Gaussian splat to prevent nearby points from being selected energy_current = energy_current + energy_splat(y, x) energy_current[y, x] = np.inf # Prevent reselection# Record the sample samples.append((y, x, I[y, x])) final_stipple[y, x] =0.0# Black dotreturn final_stipple, np.array(samples)
# Note: R implementation uses true 2D circular shift (roll2d) to match Python's np.roll behaviortoroidal_gaussian_kernel <-function(h, w, sigma) { y <-0:(h-1) x <-0:(w-1)# Compute toroidal distances dy <-pmin(y, h - y) dx <-pmin(x, w - x)# Create distance matrices# each row gets same dy, each col gets same dx dy_mat <-matrix(rep(dy, each = w), nrow = h, ncol = w) dx_mat <-matrix(rep(dx, times = h), nrow = h, ncol = w)# Compute Gaussian kern <-exp(-(dx_mat^2+ dy_mat^2) / (2.0* sigma^2)) kern <- kern /sum(kern) # Normalizereturn(kern)}# Helper function for true 2D circular shift (equivalent to np.roll)# np.roll equivalent: positive shift moves down/rightroll2d <-function(mat, shift_y =0, shift_x =0) { h <-nrow(mat) w <-ncol(mat) sy <- ((shift_y %% h) + h) %% h sx <- ((shift_x %% w) + w) %% w rows <-if (sy ==0) 1:h elsec((h - sy +1):h, 1:(h - sy)) cols <-if (sx ==0) 1:w elsec((w - sx +1):w, 1:(w - sx)) mat[rows, cols, drop =FALSE]}void_and_cluster <-function(input_img, percentage =0.08,sigma =0.9,content_bias =0.9,importance_img =NULL,noise_scale_factor =0.1) {# Clip image to [0, 1] I <-pmax(pmin(input_img, 1.0), 0.0) h <-nrow(I) w <-ncol(I)# Compute or use provided importance mapif(is.null(importance_img)) { importance <-compute_importance(I) } else { importance <-pmax(pmin(importance_img, 1.0), 0.0) }# Create toroidal Gaussian kernel for repulsion kernel <-toroidal_gaussian_kernel(h, w, sigma)# Initialize energy field: lower energy → more likely to be picked energy_current <--importance * content_bias# Stipple buffer: start with white background; selected points become black dots final_stipple <-matrix(1.0, nrow = h, ncol = w) samples <-vector("list", as.integer(h * w * percentage))# Helper function to roll kernel to an arbitrary position# This implements the exact equivalent of Python's np.roll(kernel, shift=y, axis=0) # followed by np.roll(kernel, shift=x, axis=1)# Uses true 2D circular shift to preserve directionality and avoid quadrant-mirroring energy_splat <-function(y, x) {# exact np.roll(kernel, y; x)roll2d(kernel, shift_y = y -1, shift_x = x -1) }# Number of points to select num_points <-as.integer(h * w * percentage)# Set seed for reproducibilityset.seed(42)# --- first point: pick min in a center window --- cy <-as.integer(h /2) cx <-as.integer(w /2) r <-min(20L, as.integer(h /10), as.integer(w /10)) y_start <-max(1L, cy - r) y_end <-min(h, cy + r) x_start <-max(1L, cx - r) x_end <-min(w, cx + r) region <- energy_current[y_start:y_end, x_start:x_end] flat_idx <-which.min(region)# IMPORTANT: R is column-major -> (y,x) from flat:# y = ((idx-1) %% nrow) + 1 ; x = ((idx-1) %/% nrow) + 1 ry <- ((flat_idx -1) %%nrow(region)) +1 rx <- ((flat_idx -1) %/%nrow(region)) +1 y0 <- y_start + ry -1 x0 <- x_start + rx -1# Place first point energy_current <- energy_current +energy_splat(y0, x0) energy_current[y0, x0] <-Inf# Prevent reselection samples[[1]] <-c(y0, x0, I[y0, x0]) final_stipple[y0, x0] <-0.0# Black dot# --- iterate ---for(i in2:num_points) {# Add exploration noise that decreases over time exploration <-1.0- ((i -1) / num_points) *0.5# Decrease from 1.0 to 0.5 noise <-matrix(rnorm(h * w, 0, noise_scale_factor * content_bias * exploration), nrow = h, ncol = w) energy_with_noise <- energy_current + noise# Find position with minimum energy (with noise for exploration) pos_flat <-which.min(energy_with_noise)# Column-major unflatten:# R matrices are column-major, so y = (idx-1) %% h + 1, x = (idx-1) %/% h + 1 y <- ((pos_flat -1) %% h) +1 x <- ((pos_flat -1) %/% h) +1# Add Gaussian splat to prevent nearby points from being selected energy_current <- energy_current +energy_splat(y, x) energy_current[y, x] <-Inf# Prevent reselection# Record the sample samples[[i]] <-c(y, x, I[y, x]) final_stipple[y, x] <-0.0# Black dot }# Convert samples list to matrix samples_matrix <-do.call(rbind, samples[1:num_points])return(list(stipple = final_stipple, samples = samples_matrix))}
Preparing the Working Image
Before generating the stippling pattern, we prepare the image by resizing if necessary and computing the importance map.
Comparison of original image, importance map, and blue noise stippling
par(mfrow =c(1, 3), mar =c(2, 2, 2, 2))# Originalplot(img_resized, axes =FALSE, main ="Original Image")# Importance map (matrix [h, w]) -> transpose for imager ([x, y])plot(as.cimg(t(importance_map), x =ncol(importance_map), y =nrow(importance_map), cc =1), axes =FALSE, main ="Importance Map")# Stipple pattern (matrix [h, w]) -> transpose for imager ([x, y])plot(as.cimg(t(stipple_pattern), x =ncol(stipple_pattern), y =nrow(stipple_pattern), cc =1), axes =FALSE, main ="Blue Noise Stippling")
Comparison of original image, importance map, and blue noise stippling
Progressive Stippling Animation
This section creates a GIF showing how the stippled image looks as more points are added sequentially. We’ll use the already-computed stippling points to generate frames at increments of 100 points.
import numpy as npfrom PIL import Imageimport matplotlib.pyplot as pltfrom matplotlib.animation import PillowWriter# Use the existing samples array from the already-computed stipplingprint(f"Using existing stippling with {len(samples)} points")
Using existing stippling with 7500 points
print(f"Image shape: {img_resized.shape}")
Image shape: (375, 250)
# Create progressive frames by adding points sequentiallyframe_increment =100frames = []point_counts = []# Start with white backgroundh, w = img_resized.shapeprogressive_stipple = np.ones_like(img_resized)# Add first point and save initial frameiflen(samples) >0: y0, x0, intensity0 =int(samples[0, 0]), int(samples[0, 1]), samples[0, 2] progressive_stipple[y0, x0] =0.0 frames.append(progressive_stipple.copy()) point_counts.append(1)# Add remaining points sequentially and save frames at incrementsfor i inrange(1, len(samples)): y, x =int(samples[i, 0]), int(samples[i, 1]) progressive_stipple[y, x] =0.0# Add black dot# Save frame at increments (100, 200, 300, ...) and at the endif (i +1) % frame_increment ==0or i ==len(samples) -1: frames.append(progressive_stipple.copy()) point_counts.append(i +1)print(f"Generated {len(frames)} frames")
# Create progressive frames by adding points sequentiallyframe_increment <-100frames <-list()point_counts <-c()# Start with white backgroundprogressive_stipple <-matrix(1.0, nrow =nrow(img_matrix), ncol =ncol(img_matrix))# Add first point and save initial frameif(nrow(samples) >0) { y0 <-as.integer(samples[1, 1]) x0 <-as.integer(samples[1, 2]) progressive_stipple[y0, x0] <-0.0 frames[[1]] <- progressive_stipple point_counts <-c(point_counts, 1)}# Add remaining points sequentially and save frames at incrementsfor(i in2:nrow(samples)) { y <-as.integer(samples[i, 1]) x <-as.integer(samples[i, 2]) progressive_stipple[y, x] <-0.0# Add black dot# Save frame at increments (100, 200, 300, ...) and at the endif(i %% frame_increment ==0|| i ==nrow(samples)) { frames[[length(frames) +1]] <- progressive_stipple point_counts <-c(point_counts, i) }}cat("Generated", length(frames), "frames\n")
Progressive stippling animation showing the sequential build-up of points. Each frame represents an increment of 100 points, demonstrating how the blue noise stippling pattern develops as more points are added.
Appendix: Average Tone by Section
To understand the distribution of tones across the image and identify the center of skin tone, we divide the image into a grid of small sections and compute the average brightness in each section.
# Divide image into a grid of sectionsgrid_rows =16grid_cols =12h, w = img_resized.shapesection_h = h // grid_rowssection_w = w // grid_cols# Create arrays to store average tonesaverage_tones = np.zeros((grid_rows, grid_cols))section_coords = []# Compute average tone for each sectionfor i inrange(grid_rows):for j inrange(grid_cols): y_start = i * section_h y_end = (i +1) * section_h if i < grid_rows -1else h x_start = j * section_w x_end = (j +1) * section_w if j < grid_cols -1else w section = img_resized[y_start:y_end, x_start:x_end] avg_tone = np.mean(section) average_tones[i, j] = avg_tone section_coords.append((i, j, avg_tone))# Display the grid with average tonesfig, axes = plt.subplots(1, 2, figsize=(7, 5))# Show original image with grid overlayaxes[0].imshow(img_resized, cmap='gray', vmin=0, vmax=1)axes[0].set_title('Original Image with Grid', fontsize=14, fontweight='bold', pad=10)axes[0].axis('off')
# Draw grid linesfor i inrange(grid_rows +1): y = i * section_h axes[0].axhline(y, color='red', linewidth=0.5, alpha=0.5)for j inrange(grid_cols +1): x = j * section_w axes[0].axvline(x, color='red', linewidth=0.5, alpha=0.5)# Show heatmap of average tonesim = axes[1].imshow(average_tones, cmap='gray', vmin=0, vmax=1, aspect='auto')axes[1].set_title('Average Tone by Section', fontsize=14, fontweight='bold', pad=10)axes[1].set_xlabel('Column', fontsize=12)axes[1].set_ylabel('Row', fontsize=12)# Add text annotations showing average tone valuesfor i inrange(grid_rows):for j inrange(grid_cols): tone = average_tones[i, j]# Use white text for dark sections, black text for light sections text_color ='white'if tone <0.5else'black' axes[1].text(j, i, f'{tone:.2f}', ha='center', va='center', color=text_color, fontsize=6, fontweight='bold')plt.colorbar(im, ax=axes[1], label='Average Brightness')
<matplotlib.colorbar.Colorbar object at 0x000001ED157957F0>
# Find sections with tones close to skin tone (around 0.7)skin_tone_target =0.7tolerance =0.1skin_tone_sections = [(i, j, tone) for i, j, tone in section_coords ifabs(tone - skin_tone_target) < tolerance]print(f"\nSections with tone close to skin tone (0.7 ± {tolerance}):")
Sections with tone close to skin tone (0.7 ± 0.1):
if skin_tone_sections:for i, j, tone insorted(skin_tone_sections, key=lambda x: abs(x[2] - skin_tone_target)):print(f" Row {i}, Col {j}: {tone:.3f}")else:print(" None found")
Row 4, Col 6: 0.704
Row 9, Col 6: 0.709
Row 3, Col 6: 0.690
Row 3, Col 7: 0.711
Row 3, Col 3: 0.714
Row 4, Col 7: 0.685
Row 10, Col 6: 0.684
Row 9, Col 7: 0.679
Row 8, Col 4: 0.672
Row 7, Col 4: 0.736
Row 6, Col 4: 0.659
Row 8, Col 7: 0.742
Row 9, Col 5: 0.655
Row 1, Col 7: 0.654
Row 10, Col 7: 0.749
Row 7, Col 6: 0.646
Row 1, Col 4: 0.758
Row 7, Col 7: 0.637
Row 9, Col 4: 0.635
Row 3, Col 5: 0.634
Row 1, Col 5: 0.633
Row 12, Col 5: 0.632
Row 6, Col 7: 0.631
Row 11, Col 8: 0.628
Row 4, Col 8: 0.776
Row 12, Col 8: 0.782
Row 5, Col 3: 0.785
Row 4, Col 5: 0.614
Row 5, Col 8: 0.794
Row 8, Col 3: 0.796
Row 4, Col 3: 0.797
Row 6, Col 6: 0.601
Row 2, Col 8: 0.600
# Print full grid as tableprint(f"\nFull grid of average tones (row, col):")
Full grid of average tones (row, col):
print(" ", end="")
for j inrange(grid_cols):print(f"Col{j:2d} ", end="")
Col 0 Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col10 Col11
print()
for i inrange(grid_rows):print(f"Row{i:2d} ", end="")for j inrange(grid_cols):print(f"{average_tones[i, j]:.3f} ", end="")print()
# Divide image into a grid of sectionsgrid_rows <-16grid_cols <-12h <-nrow(img_matrix)w <-ncol(img_matrix)section_h <-as.integer(h / grid_rows)section_w <-as.integer(w / grid_cols)# Create arrays to store average tonesaverage_tones <-matrix(0, nrow = grid_rows, ncol = grid_cols)section_coords <-list()# Compute average tone for each sectionfor(i in1:grid_rows) {for(j in1:grid_cols) { y_start <- (i -1) * section_h +1 y_end <-if(i < grid_rows) i * section_h else h x_start <- (j -1) * section_w +1 x_end <-if(j < grid_cols) j * section_w else w section <- img_matrix[y_start:y_end, x_start:x_end] avg_tone <-mean(section) average_tones[i, j] <- avg_tone section_coords[[length(section_coords) +1]] <-c(i, j, avg_tone) }}# Display the grid with average tonespar(mfrow =c(1, 2), mar =c(2, 2, 2, 2))# Show original image with grid overlayplot(img_resized, axes =FALSE, main ="Original Image with Grid")# Draw grid lines (approximate)# Note: imager plots use different coordinate system, so grid lines are approximatefor(i in0:grid_rows) { y <- i * section_habline(h = y, col ='red', lwd =0.5, lty =2)}for(j in0:grid_cols) { x <- j * section_wabline(v = x, col ='red', lwd =0.5, lty =2)}# Show heatmap of average tonesimage(1:grid_cols, 1:grid_rows, t(average_tones), col =gray(seq(0, 1, length.out =256)),xlab ="Column", ylab ="Row",main ="Average Tone by Section",axes =TRUE)axis(1, at =1:grid_cols)axis(2, at =1:grid_rows)# Add text annotations showing average tone valuesfor(i in1:grid_rows) {for(j in1:grid_cols) { tone <- average_tones[i, j]# Use white text for dark sections, black text for light sections text_color <-if(tone <0.5) 'white'else'black'text(j, i, sprintf('%.2f', tone), col = text_color, cex =0.4, font =2) }}
# Find sections with tones close to skin tone (around 0.7)skin_tone_target <-0.7tolerance <-0.1skin_tone_sections <-Filter(function(x) abs(x[3] - skin_tone_target) < tolerance, section_coords)cat("\nSections with tone close to skin tone (0.7 ±", tolerance, "):\n")
Sections with tone close to skin tone (0.7 ± 0.1 ):