全部版块 我的主页
论坛 提问 悬赏 求职 新闻 读书 功能一区 学道会
2337 8
2019-04-24
Heuristic solutionAlthough machine learning (ML) is commonly used in building recommendation systems, it doesn’t mean it’s the only solution. There are many cases where we want simpler approaches, for example, we may have very few data, or we may want to build a minimal solution fast etc..
In such cases, we can start with some heuristic solutions. In fact, there are lots of hacks we can do to build a simple recommendation system. For instance, based on videos a user has watched, we can simply suggest videos from same authors. We can also suggest videos with similar titles or labels. If we use the popularity (number of comments, shares) as another signal, the recommendation system can work pretty well as a baseline.
Collaborative filteringWhen talking about recommendation system, I can hardly avoid mentioning collaborative filtering (CF), which is the most popular technique used in recommendation systems. Since not everyone has a machine learning background, I won’t go deeper about the algorithm. In fact, the beauty of collaborative filtering is that the basic idea is so simple that everyone can easily understand it.
In a nutshell, to recommend videos for a user, I can provide videos liked by similar users. For instance, if user A and B have watched a bunch of same videos, it’s highly likely that user A will like videos liked by B. Of course, there are many ways to define what “similar” means here. It could be two users have liked same videos, it could also mean that they share the same location.
The above algorithm is called user-based collaborative filtering. Another version is called item-based collaborative filtering, which means to recommend videos (items) that are similar to videos a user has watched.
Feature engineerSo for Youtube video recommendation, what features can be used to build the recommendation system?
Usually, there are two types of features – explicit and implicit features. Explicit features can be ratings, favorites etc.. In Youtube, it can be the like/share/subscribe actions. Implicit features are less obvious. If a user has watched a video for only a couple of seconds, probably it’s a negative sign. Given a list of recommended videos, if a user clicks one over another, it can mean that he prefer to the one clicked. Usually, we need to explore a lot about implicit features.
Back to the Youtube problem, there are several features are quite obvious:
  • Like/share/subscribe – As mentioned above, they are strong signs about a user’s preferences.
  • Watch time
  • Video title/labels/categories
  • Freshness
It’s worth to note that when building machine learning systems, you have to experiment a lot with different combination of features so that you won’t know which one is good unless you give it a try.
InfrastructureIt can also be used to discuss infrastructure. Apparently, the system contains multiple steps/components. so how would you design the whole system in terms of infrastructure?
Given that comparing similar users/videos can be time-consuming on Youtube, this part should be done in offline pipelines. Therefore, we can divide the whole system into online and offline.
For the offline part, all the user models and videos need to store in distributed systems. Pipelines that calculate similar users/videos are also running regularly in order to keep data updated. In fact, for most machine learning systems, it’s common to use offline pipeline to process big data as you won’t expect it to finish with few seconds.
For the online part, based on the user profile and his actions (like videos just watched), we should be able to provide a list of recommended videos from offline data. Normally, the system fetches more videos than needed and then do filtering and ranking on the fly. We can filter videos that are obviously irrelevant like videos the user has watched. And then we should also rank the suggestions. Few factors should be considered include video popularity (share/comment/like numbers), freshness, quality and so on.
SummaryIn reality, there are many ways to improve the system that we haven’t covered yet. I’d like to briefly mention few techniques:
  • Freshness can be a very important factor. We should figure out how to recommend fresh content.
  • Eval is an essential component of recommendation system, which allows us to understand how well the system works.
  • To train the collaborative filtering system, we may also include video position signals. Usually, videos ranked on top have much higher chance to be clicked.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2019-4-24 09:15:17
为您点赞!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-4-24 10:07:27
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-4-24 11:46:35
加油楼主
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-4-24 11:47:54
点赞
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-4-24 15:24:14
感谢分享,向您学习,赞!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群