Learning Goal: I’m working on a machine learning question and need guidance to help me learn.
1 Concepts of Learning (Ed Tam)
The goal of this question is to get you familiarized with the conceptual taxonomy of different types of machine
learning tasks and to get you thinking about how ML could be applied in real life.
Some common types of machine learning tasks/problems are listed below:
conditional probability estimation• density estimation
You are given a list of situations below. Assign one machine learning task from the list above to each situa-
The situations are deliberately designed to simulate real-life applications and hence are open-ended. For
each situation, there will be more than one answer that could be appropriate, depending on your interpre-
tation of the data/task. You only have to give ONE reasonable answer for each situation to get
full credit. Please limit any justification/explanation to at most two sentences.
You love betting on horse races. One popular betting mechanism is that if you guess the 1st, 2nd and
3rd place horses in a race correctly, you will win a grand prize. You have features about each horse,
and would like to train a ML algorithm to create bets that would maximize the chances of winning a
You are an astronomer studying galaxies. You have used the James Webb space telescope to collect
10000 images of different galaxies. Galaxies can be of 4 different types: spiral, elliptical, peculiar and
irregular. You hand labelled 100 of these images yourself using your expert knowledge, and you would
like an ML algorithm to distinguish what types of galaxies the remaining images correspond to.
You are a labor economist interested in having an ML algorithm being able to predict individuals’
post-college incomes. You have a dataset on 1000 people, with data on each person’s characteristics
(e.g., educational level, what state they are from) and their starting salary in their first post-college
You are the owner of a restaurant that is famous for your vegetable soup. You are trying to determine
how many pounds of vegetables to buy for next week. If you buy too much, the leftover vegetables go
to waste. If you buy too little, you will run out of vegetables prematurely. You have a database that
contains data about all past weeks (whether there’s a holiday coming up, what the weather is likely to
be, and how many customers you had that week, etc.).
You are a product analyst at Forever 21. You discover that different products sell at drastically
different rates depending on the current fashion trends and the current season. You would like to use
the available sales data to uncover some of the main current trends and correlations between different
You are a UX researcher at a social network company. You are wondering whether the introduction
of a new option will affect how much time users spend watching videos on the platform. In particular,
you would like to learn about how the introduction of this new option will change the mean, kurtosis,
and variance of the video-watch-time distribution. You start collecting data after the new option is
You are a salesperson that needs to pursue a set of customers. You have features about each of them,
and would like to pursue them according to how likely it is that you will make a sale.
You are a mortgage specialist at Fannie Mae. You are trying to decide whether to extend a mortgage
to a company for them to buy an office. You have a model that determines the likelihood of the
company defaulting. You have some newly collected data which shows that the company has had a
steady stream of cash flow for the past 5 years and has never defaulted on any loans before. In light
of this new information, you would like the model to update its estimate of default.
You are a radiologist working on applying ML research in assisting medical diagnosis. In particular,
you would like to develop an ML algorithm that can take in MRI scans of individual patients and
output how likely it is that the patient is suffering from a malignant tumor.
You are a biologist studying the genetic lineage of different species. You have genomic data (a genome
is the collection of all genes/genetic materials present in an organism) on 100 different species, and you
would like group these species according to their “genetic distance” from one another.