Posted 2025-04-23Updated 2025-04-29Tech / System-Design / ML3 minutes read (About 441 words)

People You May Konw

Problem Statement

People You May Know (PYMK) is a list of users with whom you may want to connect based on things you have in common, such as a mutual friend, school, or workplace.

Clarifying Requirements

What’s the objective/motivation?
- May I assume the motivation of the PYMK feature is to help users discover potential connections and grow their network?
How to define ‘connections’?
- May I assume that people are friends only if each is a friend of the other?
What should I consider for forming connections?
- To recommend potential connections, a huge list of factors must be considered, such as educational background, work experiences, existing connections, historical activities. Should I focus on the most important factors?
Define the scale
- How many connections does an average user have? (like 1000)
- What’s the total number of users on this platform? (like 1 billion)
- How many of them are daily active users? (like 300 million)
Model Dynamics
- May I assume that the social graph of most users are not dynamic, meaning that their connections don’t change significantly over a short period.

Frame the Problem as an ML task

Define the ML Objective
- Maximize the number of formed connections between users
Define the input and the output
- input: a user
- output: a list of users

Choosing ML methology
There are commonly two approaches for PYMK: Pointwise Learning-To-Rank (LTR) and Edge Prediction.

Pointwise LTR

Why this approach:
- The pointwise LTR transforms a ranking problem to a supervised learning problem, where the model takes user and a candidate connection pair as inputs, and outputs a score or probability.
- In cold-start scenarios where rich interaction data is lacking, the pointwise LTR can quickly model the similarity between user pairs.
Why not this approach:
- This ignores some social context, the inputs are distinct users

Edge Prediction

The entire social context can be represented as a graph, where each node represents a user, and an edge between two nodes indicates a formed connection between two users.
Use the entire graph as input, and predict the probability of an edge existing between user A and other users

Data Preparation
We typically use three types of raw data:

User
Connections
Interactions

One challenge with this

Feature Engineering

Model Development

Appendix

Graph-based
In general, a graph represents relations(edges) between a collection of entities(nodes). Graphs can store structural data, there are three general types of prediction tasks that can be performed on structured data represented by graphs:

Graph-level
- like chemical compound
Edge-level
- like PYMK on social media
Node-level
- like predicting whether a person is a spammer or not, given a social network graph

People You May Konw

https://janofsun.github.io/2025/04/23/People-You-May-Know/

Author

Jie Sun

Posted on

2025-04-23

Updated on

2025-04-29

Licensed under

#Machine Learning Grokking-the-ML-system-designs

People You May Konw

Problem Statement

Appendix

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Categories

Tags