About This Site

This is a board game recommendation system built on data from BoardGameGeek reviews dataset available on Kaggle. The site analyzes millions of user ratings to provide personalized game recommendations.

What This Site Can Do

  • Personalized Recommendations - Rate games you've played to get tailored suggestions for new games you'll likely enjoy
  • Intelligent Analysis - Uses machine learning to understand your gaming preferences across multiple dimensions (complexity, theme, mechanics, etc.)
  • Game Discovery - Browse and search through thousands of board games with detailed information
  • Preference Insights - Visualize your gaming personality and see how your tastes compare to other players
  • Data-Driven Predictions - Get confidence ratings and explanations for why certain games are recommended

The system becomes more accurate as you rate more games, with optimal performance achieved after rating 30+ games across different genres and complexity levels.


Mathematical Foundations of Boardgame Ratings

This document explains the mathematical techniques and algorithms used in our boardgame recommendation system. Understanding these concepts will help you appreciate how we transform user ratings into meaningful predictions and recommendations.

Table of Contents

  1. Overview: From Ratings to Vectors
  2. Singular Value Decomposition (SVD)
  3. Collaborative Filtering
  4. PostgreSQL Vector Similarity and Recommendations
  5. Linear Regression for New Users
  6. Statistical Analysis
  7. Implementation Details
  8. Myers-Briggs Type Indicator for Board Games

Overview: From Ratings to Vectors

Our recommendation system transforms sparse user-game rating data into dense mathematical vectors that capture latent features about both users and games. This transformation enables us to:

  • Predict ratings for games a user hasn't rated
  • Find similar games based on underlying characteristics
  • Recommend games tailored to individual preferences
  • Analyze patterns in user behavior and game properties

The core insight is that user preferences and game characteristics can be represented as points in a high-dimensional space, where proximity indicates similarity.

Singular Value Decomposition (SVD)

What is SVD?

Singular Value Decomposition is a fundamental matrix factorization technique that decomposes our rating matrix R (users × games) into three matrices:

R ≈ U × Σ × V^T

Where: - U contains user feature vectors (users × factors) - Σ is a diagonal matrix of singular values (factors × factors) - V^T contains game feature vectors (factors × games)

The Step-by-Step SVD Process

Step 1: Building the Ratings Matrix R

We start with a sparse ratings matrix where each entry R[i,j] represents user i's rating for game j:

Ratings Matrix R (simplified example):
        Game1  Game2  Game3  Game4  Game5
User1     8      ?      6      ?      9
User2     ?      4      ?      3      ?
User3     7      ?      ?      5      8
User4     ?      2      9      ?      ?
User5     6      ?      8      4      7

Most entries are missing (represented by ?), creating a sparse matrix. In our real system, we have hundreds of thousands of users and tens of thousands of games, with each user typically rating only a small fraction of all games.

Step 2: Matrix Factorization into U, Σ, and V^T

SVD decomposes the ratings matrix R into three components:

Matrix U (User Features): Each row represents a user as a vector in the latent factor space: U Matrix (users × 15 factors): F1 F2 F3 ... F15 User1 0.2 -0.8 0.4 ... 0.1 User2 -0.5 0.3 -0.2 ... 0.6 User3 0.8 -0.1 0.9 ... 0.3 ...

Matrix Σ (Singular Values): A diagonal matrix containing the importance of each factor: Σ Matrix (15 × 15): Factor1: 45.2 Factor2: 32.1 Factor3: 28.7 ... Factor15: 2.3

Matrix V^T (Game Features): Each column represents a game as a vector in the same latent factor space: V^T Matrix (15 factors × games): Game1 Game2 Game3 ... Factor1 0.6 -0.3 0.8 ... Factor2 -0.2 0.7 -0.4 ... Factor3 0.9 0.1 0.6 ... ...

Step 3: Understanding the Role of Singular Values

The singular values in the Σ matrix serve several critical purposes:

  1. Factor Importance Ranking: Larger singular values indicate more important factors that explain more variance in the ratings. In our example:

    • Factor 1 (σ=45.2) might represent "strategy vs. luck preference"
    • Factor 2 (σ=32.1) might represent "game complexity preference"
    • Factor 15 (σ=2.3) captures much less important patterns or noise
  2. Dimensionality Reduction: We typically keep only the top k factors (k=15 in our case) and discard factors with small singular values. This reduces noise and computational complexity while retaining the most meaningful patterns.

  3. Variance Explanation: The proportion of total variance explained by factor i is σ²ᵢ / Σ(σ²ⱼ). This tells us how much each factor contributes to explaining user preferences.

Step 4: Converting to User and Game Vectors

In our implementation, we use biased=False, which simplifies the decomposition. The final user and game vectors are computed as:

  • User vectors: U × √Σ (incorporating singular values into user representations)
  • Game vectors: V^T × √Σ (incorporating singular values into game representations)

This gives us the final vectors: python user_vectors = svd.pu # Shape: (n_users, 15) - stored in Redis game_vectors = svd.qi # Shape: (n_games, 15) - stored in PostgreSQL

Each user becomes a 15-dimensional vector representing their preferences across latent factors: User1_vector = [0.8, -1.2, 0.6, 0.3, -0.9, 0.1, 0.7, -0.4, 0.5, 0.2, -0.3, 0.8, 0.1, -0.6, 0.4]

Each game becomes a 15-dimensional vector representing its characteristics along the same factors: Game1_vector = [0.9, -0.8, 0.7, 0.2, -0.5, 0.3, 0.6, -0.1, 0.4, 0.8, -0.2, 0.5, 0.3, -0.7, 0.1]

Step 5: Making Predictions

With biased=False, predicting a rating becomes a simple dot product:

predicted_rating = user_vector · game_vector
predicted_rating = Σ(user[i] × game[i]) for i = 1 to 15

For example: User1 rating for Game1 = (0.8×0.9) + (-1.2×-0.8) + (0.6×0.7) + ... = 7.3

This prediction represents how much User1 would likely enjoy Game1 based on the learned patterns from all users' ratings.

Mathematical Interpretation

The beauty of this approach is that each factor captures a different aspect of preferences:

  • Factor 1 might represent "complexity preference": positive values indicate users who prefer complex games, negative values prefer simple games
  • Factor 2 might represent "theme preference": positive for fantasy themes, negative for historical themes
  • Factor 3 might represent "interaction level": positive for highly interactive games, negative for solitary games

When we compute the dot product, we're essentially asking: "How well do this user's preferences align with this game's characteristics across all learned factors?"

Why SVD Works for Recommendations

  1. Dimensionality Reduction: SVD identifies the most important latent factors that explain rating patterns, reducing the original sparse matrix to dense, meaningful representations
  2. Noise Reduction: By keeping only the top factors (highest singular values), we filter out noise and focus on robust patterns
  3. Generalization: The learned factors help predict ratings for unseen user-game pairs by capturing fundamental preference patterns
  4. Computational Efficiency: Dense 15-dimensional vectors are much faster to work with than sparse rating matrices

Our Implementation

We use the Surprise library's SVD implementation with specific parameters:

svd = SVD(n_factors=15, biased=False)
  • 15 factors: Balances model complexity with interpretability - enough to capture nuanced preferences without overfitting
  • biased=False: Ensures predictions follow the simple formula: rating = user_vector · game_vector

Collaborative Filtering

The Core Principle

Collaborative filtering assumes that users with similar rating patterns will like similar games. Our approach:

  1. Identify latent factors that explain why users rate games as they do
  2. Learn user preferences for each factor (e.g., strategy vs. luck, complexity vs. simplicity)
  3. Learn game characteristics along the same factors
  4. Predict ratings by matching user preferences with game characteristics

Matrix Factorization Process

Starting with a sparse ratings matrix where most entries are missing:

        Game1  Game2  Game3  Game4
User1     5      ?      3      ?
User2     ?      4      ?      2
User3     3      ?      ?      4
User4     ?      1      5      ?

SVD learns that User1 and User3 have similar preferences, and Game1 and Game3 share characteristics, allowing us to fill in the missing ratings.

PostgreSQL Vector Similarity and Recommendations

Inner Product for Predictions

Our system uses inner product (dot product) for vector similarity and rating prediction:

predicted_rating = user_vector · game_vector = Σ(user[i] × game[i])

This choice is intentional because: - With biased=False in SVD, the inner product directly predicts the rating - Higher values indicate stronger predicted preference - Mathematically optimal for collaborative filtering

PostgreSQL with pgvector

We use PostgreSQL with pgvector extension for scalable vector similarity search:

Game Vector Storage

Game vectors are stored in PostgreSQL with dedicated vector columns: sql CREATE TABLE games ( id INTEGER PRIMARY KEY, game_name TEXT, popularity FLOAT, vote_average FLOAT, vector_4 vector(4), -- 4-dimensional vectors vector_5 vector(5), -- 5-dimensional vectors vector_15 vector(15) -- 15-dimensional vectors );

Vector Indexes for Performance

We use IVFFlat indexes optimized for inner product searches: sql -- Inner product indexes for fast similarity search CREATE INDEX ON games USING ivfflat (vector_4 vector_ip_ops); CREATE INDEX ON games USING ivfflat (vector_5 vector_ip_ops); CREATE INDEX ON games USING ivfflat (vector_15 vector_ip_ops);

Recommendation Queries

Real-time recommendations use PostgreSQL's native vector operators: sql SELECT id, game_name, popularity, vote_average, (vector_15 <#> '[user_vector_values]') * -1 AS predicted_rating FROM games WHERE vector_15 IS NOT NULL ORDER BY vector_15 <#> '[user_vector_values]' LIMIT 25;

Key PostgreSQL Vector Operators: - <#>: Negative inner product (for similarity ranking) - <->: L2 distance (Euclidean distance) - <=>: Cosine distance

Performance Optimization

To ensure accurate results with IVFFlat indexes: sql SET ivfflat.probes = 10; -- Search more clusters for accuracy

User Vector Storage (Redis)

User vectors remain in Redis for fast access: json { "user_vector_4": [1.57, -0.61, -0.86, 1.63], "user_vector_5": [1.57, -0.61, -0.86, 1.63, 0.23], "user_vector_15": [1.57, -0.61, -0.86, 1.63, ...] }

Hybrid Architecture Benefits

This hybrid approach provides: 1. PostgreSQL for games: ACID compliance, complex queries, vector indexes 2. Redis for users: High-speed access, JSON flexibility, caching 3. Optimal performance: ~1-3ms query times for 25 recommendations from 23K+ games 4. Scalability: Handles millions of games with sub-linear query complexity

Linear Regression for New Users

The Cold Start Problem

When a new user rates a few games, we need to compute their user vector to make recommendations. This is a linear regression problem.

Mathematical Formulation

Given user ratings for games with known vectors, we solve:

Ax = b

Where: - A is the matrix of game vectors (n_games × n_factors) - x is the unknown user vector (n_factors × 1) - b is the vector of user ratings (n_games × 1)

Solution Methods

  1. Normal Equations (primary method): x = (A^T A)^(-1) A^T b

  2. Gradient Descent (fallback for numerical issues):

    • Iteratively minimize the squared error: ||Ax - b||²
    • Learning rate: 0.001
    • Iterations: 1000

Implementation Details

For new users, we retrieve game vectors from PostgreSQL and solve for user vectors:

def solve_least_squares(game_vectors, ratings)
  a_matrix = Matrix[*game_vectors]
  b_vector = Matrix.column_vector(ratings)

  # Normal equations approach
  at_a = a_matrix.transpose * a_matrix
  at_b = a_matrix.transpose * b_vector

  solve_linear_system(at_a, at_b)
end

# Get game vectors from PostgreSQL
def self.calc_user_vectors(user_ratings, min_ratings)
  game_ids = user_ratings.map { |rating| rating[:game_id] }
  games = Game.where(id: game_ids).select(:id, vector_column_for(feature_size))

  # Extract vectors and ratings for least squares
  game_vectors = []
  scores = []
  user_ratings.each do |rating|
    game = games.find { |g| g.id == rating[:game_id] }
    next unless game&.send(vector_column_for(feature_size))

    game_vectors << game.send(vector_column_for(feature_size))
    scores << rating[:score]
  end

  # Solve for user vector that best explains their ratings
  solve_least_squares(game_vectors, scores)
end

Statistical Analysis

Z-Score Analysis

To understand what makes users or games unique, we calculate z-scores for each vector component:

z_score = (value - mean) / standard_deviation

High absolute z-scores indicate distinctive characteristics. For example: - A user with z-score = +2.5 for factor 7 strongly prefers games high in that characteristic - A game with z-score = -3.0 for factor 3 is unusually low in that trait

Caching Strategy

We cache statistical computations for performance:

  1. User statistics: Mean and standard deviation for each factor across all users
  2. Game statistics: Similar statistics for games
  3. Cache invalidation: Based on database size changes and timestamps
  4. Batch processing: Use Redis pipelines for efficient data retrieval

Percentile Calculations

For each game property, we calculate percentiles to understand relative positioning:

def calculate_percentile(property_index, value)
  values = all_games.map { |game| game[:vector][property_index] }.sort
  rank = values.count { |v| v <= value }
  (rank.to_f / values.length * 100).round(1)
end

Implementation Details

Data Pipeline

  1. Raw Ratings: User-game-rating triplets from BoardGameGeek (26M+ ratings, 200K+ users)
  2. SVD Training: Learn user and game vectors using collaborative filtering (Python/Surprise)
  3. PostgreSQL Storage: Store game vectors with metadata using pgvector extension
  4. Redis Storage: Store user vectors with JSON structure for fast access
  5. Index Creation: Build IVFFlat inner product indexes for vector similarity
  6. Real-time Queries: Serve recommendations via PostgreSQL vector search

Storage Architecture

PostgreSQL (Games)

-- 23,000+ games with multiple vector dimensions
CREATE TABLE games (
  id INTEGER PRIMARY KEY,
  game_name TEXT NOT NULL,
  popularity FLOAT,
  vote_average FLOAT,
  vector_4 vector(4),
  vector_5 vector(5),
  vector_15 vector(15)
);

-- Inner product indexes for each dimension
CREATE INDEX ON games USING ivfflat (vector_4 vector_ip_ops);
CREATE INDEX ON games USING ivfflat (vector_5 vector_ip_ops);
CREATE INDEX ON games USING ivfflat (vector_15 vector_ip_ops);

Redis (Users)

// User vectors in JSON format
{
  "user:$username": {
    "user_vector_4": [1.57, -0.61, -0.86, 1.63],
    "user_vector_5": [1.57, -0.61, -0.86, 1.63, 0.23],
    "user_vector_15": [1.57, -0.61, ..., 0.45]
  }
}

Performance Optimizations

  1. Vector Indexes: IVFFlat indexes reduce search from O(n) to O(log n) complexity
  2. Probe Tuning: SET ivfflat.probes = 10 balances accuracy vs. speed
  3. Batch Operations: PostgreSQL bulk inserts for vector updates
  4. Connection Pooling: Rails ActiveRecord connection management
  5. Caching: Redis for user vector caching and statistical computations

Error Handling and Validation

  1. Minimum Ratings: Require at least 5 ratings for reliable user vectors
  2. RMSE Calculation: Monitor prediction quality during vector computation
  3. Fallback Methods: Use gradient descent when matrix operations fail
  4. Data Validation: Ensure vectors have correct dimensionality and numeric types

Scalability Considerations

  • Memory Usage: 15-dimensional vectors are compact yet expressive
  • Search Performance: PostgreSQL pgvector scales to millions of games with sub-linear complexity
  • Hybrid Storage: PostgreSQL ACID properties for games, Redis speed for users
  • Update Strategy: Batch PostgreSQL updates for games, real-time Redis updates for users
  • Distributed Computing: Can be extended to PostgreSQL replicas and Redis clusters

Query Performance Metrics

Real-world performance measurements: - Vector similarity search: 1-3ms for top 25 recommendations from 23K+ games - Index effectiveness: ~200 inner products calculated vs. 23,000 brute force - Accuracy: IVFFlat with probes=10 provides near-exact results - Throughput: Handles hundreds of concurrent recommendation requests

Critical Implementation Fix: Inner Product vs L2 Distance

Previous Implementation Issue: The system was initially using L2 distance (<->) for similarity ranking: sql -- INCORRECT: Using geometric distance instead of predicted rating ORDER BY vector_15 <-> '[user_vector]'

This approach was mathematically incorrect because: 1. L2 distance measures geometric proximity, not rating prediction quality 2. Games could be "close" in vector space but have low predicted ratings 3. Results were meaningless (e.g., "My Little Pony Hide & Seek" as top recommendation)

Corrected Implementation: Now using inner product (<#>) for proper collaborative filtering: sql -- CORRECT: Using inner product to maximize predicted rating ORDER BY vector_15 <#> '[user_vector]'

Impact of the Fix: - Before: Random, nonsensical recommendations - After: High-quality recommendations like Gloomhaven series, Pandemic Legacy - Mathematical alignment: Now properly implements the SVD prediction formula - User experience: Recommendations went from meaningless to highly relevant

This fix demonstrates the critical importance of using the correct mathematical operation for the underlying model - geometric similarity ≠ collaborative filtering prediction.

Mathematical Intuition

What Do the Factors Represent?

While the 15 factors are learned automatically, they often correspond to interpretable game characteristics:

  • Factor 1: Strategy vs. Luck
  • Factor 2: Game Complexity
  • Factor 3: Player Interaction Level
  • Factor 4: Game Duration
  • Factor 5: Theme Preference (Fantasy vs. Historical)
  • etc.

Users with positive values for "Strategy" factor will be recommended strategy games, while those with negative values might prefer luck-based games.

Prediction Accuracy

The system's effectiveness comes from: 1. Large Dataset: Millions of ratings provide robust statistical foundation 2. Appropriate Dimensionality: 15 factors capture complexity without overfitting 3. Quality Metrics: RMSE tracking ensures model performance 4. Cross-Validation: Train/test splits validate generalization

This mathematical foundation enables our system to provide personalized, accurate recommendations while remaining computationally efficient and scalable.

Myers-Briggs Type Indicator for Board Games

Psychological Background

The Myers-Briggs Type Indicator (MBTI) is a widely-used personality assessment tool in psychology, developed by Katherine Briggs and Isabel Myers based on Carl Jung's psychological types theory. The MBTI categorizes personalities using four binary dimensions:

  • Extraversion (E) vs. Introversion (I): How you direct your energy
  • Sensing (S) vs. Intuition (N): How you process information
  • Thinking (T) vs. Feeling (F): How you make decisions
  • Judging (J) vs. Perceiving (P): How you approach the outside world

These four binary choices create 16 distinct personality types (like ENFP, ISTJ, etc.), each representing a unique combination of cognitive preferences and behavioral tendencies.

Application to 4-Feature Models

The Myers-Briggs framework can be applied to any system with exactly 4 latent features that can be reduced to binary characteristics. In our board game recommendation system, when using 4 features, we create a gaming personality type by:

  1. Calculating the mean for each of the 4 features across all users
  2. Comparing each user's feature values to these population means
  3. Creating a 4-bit binary pattern where each bit represents whether the user is above (1) or below (0) the mean for that feature
  4. Converting the binary pattern to a readable 4-letter code using our mapping function

This approach transforms continuous preference data into discrete personality categories, making user preferences more interpretable and comparable.

The Four Board Game Personality Dimensions

Based on analysis of games with the highest and lowest values for each feature in our 4-feature model, we can infer the psychological dimensions these factors represent:

S/C - Simple vs. Complex

  • S (Simple): Light, accessible games with straightforward rules and quick gameplay
    • High games: Party games like "Greasy Spoon", "Meridians", "Tide of Fortune"
    • Low games: Heavy campaign games like "Gloomhaven", "Frosthaven", "HATE"
  • C (Complex): Deep, intricate games requiring significant time investment and rules mastery
    • Values strategic depth and mechanical complexity over accessibility

A/I - Amicable vs. Intense

  • A (Amicable): Social, party-oriented games emphasizing fun group interactions
    • High games: Party games like "Charades", "Taboo", "Pictionary", "Articulate!"
    • Low games: Heavy combat/miniature games like "HATE", "Frosthaven"
  • I (Intense): Strategic, competitive games with serious gameplay and deep engagement
    • Appeals to those seeking challenging, immersive gaming experiences

V/M - Vintage vs. Modern

  • V (Vintage): Traditional games with classic, time-tested designs
    • High games: Classic games like "Tic-Tac-Toe", "Busen Memo", "Magic Realm"
    • Low games: Contemporary Euro-style games like "Ark Nova"
  • M (Modern): Contemporary games featuring innovative mechanics and current design philosophy
    • Values modern production quality and evolved gameplay mechanics

O/E - Offensive vs. Evasive

  • O (Offensive): Direct conflict games with high player interaction and confrontation
    • High games: Conflict-driven games like "Oath", "Chess", "Root"
    • Low games: Solo-capable games like "Unbroken", low-interaction games
  • E (Evasive): Games minimizing direct conflict, focusing on indirect competition or solo play
    • Prioritizes peaceful gameplay and personal achievement over confrontation

Gaming Personality Types

This system generates 16 distinct gaming personalities:

  • SAVO: Simple-Amicable-Vintage-Offensive - Classic party games with direct competition (e.g., Charades)
  • SAVE: Simple-Amicable-Vintage-Evasive - Traditional family games with indirect competition (e.g., Bingo)
  • SAMO: Simple-Amicable-Modern-Offensive - Modern party games with player interaction (e.g., Exploding Kittens)
  • SAME: Simple-Amicable-Modern-Evasive - Contemporary casual games with minimal conflict (e.g., Sushi Go!)
  • SIVO: Simple-Intense-Vintage-Offensive - Classic competitive games requiring focus (e.g., Chess)
  • SIVE: Simple-Intense-Vintage-Evasive - Traditional solo puzzles and brain teasers (e.g., Solitaire)
  • SIMO: Simple-Intense-Modern-Offensive - Modern quick competitive games (e.g., Love Letter)
  • SIME: Simple-Intense-Modern-Evasive - Contemporary solo/puzzle games (e.g., Sagrada)
  • CAVO: Complex-Amicable-Vintage-Offensive - Classic social strategy games (e.g., Diplomacy)
  • CAVE: Complex-Amicable-Vintage-Evasive - Traditional cooperative complex games (e.g., Bridge)
  • CAMO: Complex-Amicable-Modern-Offensive - Modern social deduction and negotiation games (e.g., Secret Hitler)
  • CAME: Complex-Amicable-Modern-Evasive - Contemporary cooperative Euro games (e.g., Pandemic)
  • CIVO: Complex-Intense-Vintage-Offensive - Classic war and conquest games (e.g., Risk)
  • CIVE: Complex-Intense-Vintage-Evasive - Traditional heavy solo experiences (e.g., Magic Realm)
  • CIMO: Complex-Intense-Modern-Offensive - Modern competitive heavy games (e.g., Root)
  • CIME: Complex-Intense-Modern-Evasive - Contemporary heavy Euro games (e.g., Ark Nova)

Each combination reveals a unique gaming personality profile, helping to explain why certain games resonate with specific players and enabling more targeted recommendations based on deeper psychological preferences rather than just rating patterns.