Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback #1

Open
wants to merge 62 commits into
base: feedback
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
d7ce1a8
Setting up GitHub Classroom Feedback
github-classroom[bot] Jan 29, 2024
2854502
docs: pull request template 생성
jinmin111 Feb 2, 2024
639c282
docs: feature request template 생성
jinmin111 Feb 2, 2024
60b452a
docs: bug report template 생성
jinmin111 Feb 2, 2024
1315a31
feat: add wandb in s3rec
SiwooPark00 Feb 5, 2024
072165c
create gitignore
SiwooPark00 Feb 5, 2024
f158d90
update gitignore
SiwooPark00 Feb 5, 2024
a802c06
Merge pull request #4 from boostcampaitech6/feat/2_add_wandb_s3rec
SiwooPark00 Feb 6, 2024
d875fa0
feat: recbole baseline #5
SiwooPark00 Feb 7, 2024
51403cc
Merge pull request #10 from boostcampaitech6/feat/3-recbole_baseline
arctic890 Feb 7, 2024
f01326a
Create README.md
SiwooPark00 Feb 7, 2024
205d9a2
Update README.md
SiwooPark00 Feb 7, 2024
2ad6651
feat: add EASE
Feb 7, 2024
1bbe875
fix: atomic file path
Feb 7, 2024
46b4f5a
Merge branch 'main' of https://github.com/boostcampaitech6/level2-mov…
Feb 7, 2024
1d93fbf
feat: add data preprocess #6
jinmin111 Feb 10, 2024
2275c5e
feat: add MultiDAE, MultiVAE #6
jinmin111 Feb 10, 2024
e8c1498
feat: add train #6
jinmin111 Feb 10, 2024
f6f70fd
docs: add requirements.txt #6
jinmin111 Feb 10, 2024
feb38e3
fix: change column name for sequential model
SiwooPark00 Feb 11, 2024
4d88b14
refactor: add drop_out argument #6
jinmin111 Feb 12, 2024
ec3684a
feat: sequential model
chris3427 Feb 13, 2024
3ee9aa5
feat/add deep_fm model
ksb3966 Feb 14, 2024
4017751
feat: add inference #6
jinmin111 Feb 15, 2024
d84249e
fix:save 과정에서 저장경로 없으면 만들도록 변경
SiwooPark00 Feb 15, 2024
048abee
update gitignore
SiwooPark00 Feb 15, 2024
a6b35ad
fix: arg 버그 수정 #6
jinmin111 Feb 15, 2024
e7dcf85
Merge branch 'feat/6-MultiDAE' of https://github.com/boostcampaitech6…
jinmin111 Feb 15, 2024
323ec0f
Merge pull request #11 from boostcampaitech6/feat/6-MultiDAE
SiwooPark00 Feb 15, 2024
d0a5c6c
fix: recvae final config
SiwooPark00 Feb 15, 2024
7089e5e
feat: add recbole ensemble
SiwooPark00 Feb 18, 2024
414f0a6
feat: add slimElastic model
Feb 19, 2024
38a8b0b
Merge pull request #17 from boostcampaitech6/feat/14-recbole_ensemble
arctic890 Feb 19, 2024
80c2e78
Feat. Add Bert4Rec Model Base
ksb3966 Feb 19, 2024
d952ef9
feat:EDA
chris3427 Feb 20, 2024
e5b98db
feat: add custom EASE
Feb 21, 2024
fe3ef4a
add: ADMMSLIM config
SiwooPark00 Feb 21, 2024
c5fe1eb
feat: hard voting using output files
Feb 21, 2024
f98c73f
fear:EDA
chris3427 Feb 21, 2024
c73c50f
feat: add DiffRec
Feb 22, 2024
b7b2f75
feat: add LightGCN config
Feb 22, 2024
47b72ac
Update DiffRec.yaml
arctic890 Feb 22, 2024
a5b41d6
feat:README
chris3427 Feb 23, 2024
5f72bc1
fix:conflicts
chris3427 Feb 23, 2024
259dec3
Merge pull request #18 from boostcampaitech6/Feat/Bert4Rec
arctic890 Feb 23, 2024
4a83399
Merge pull request #19 from boostcampaitech6/feat/16-EASE_custom
arctic890 Feb 23, 2024
ea2d7ed
Merge pull request #21 from boostcampaitech6/feat/17-hard_voting
arctic890 Feb 23, 2024
e4e4264
Merge pull request #22 from boostcampaitech6/feat/9-sequential_model
arctic890 Feb 23, 2024
86ad313
Update DiffRec.yaml
arctic890 Feb 23, 2024
12d29cc
Merge pull request #23 from boostcampaitech6/feat/18-DiffRec
arctic890 Feb 23, 2024
eba4400
feat: add weight and score
Feb 23, 2024
251b0fb
feat: add new user features
Feb 23, 2024
f16ba49
Feat/Modifying Bert4rec Directory Path
ksb3966 Feb 26, 2024
dcf3444
Feat/Move File to New Directory
ksb3966 Feb 26, 2024
7b7e8bc
Feat/Add New Model EASER
ksb3966 Feb 26, 2024
26c615d
Create README.md
ksb3966 Feb 26, 2024
0ef2e6a
Add Wrap-Up Report
ksb3966 Feb 26, 2024
70823d7
Add Wrap-up Report
ksb3966 Feb 26, 2024
7591e46
Update README.md
ksb3966 Feb 26, 2024
d78f4dd
Delete Movie Recommendation Wrap-up Report - Suggestify.pdf
ksb3966 Feb 26, 2024
c07016c
Merge pull request #24 from boostcampaitech6/feat/19-voting
chris3427 Feb 26, 2024
e5a0d99
Merge pull request #25 from boostcampaitech6/feat/20-user_feature
chris3427 Feb 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: Bug Report
about: 버그 리포트할 때 사용하는 템플릿
title: "[BUG] "
labels: bug

---

## Description

## How to reproduce

1.
2.
3.

## Solution
16 changes: 16 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
name: Feature Request
about: 새로운 기능을 추가할 때 사용하는 템플릿
title: "[FEAT] "
labels: feature
assignees: ''

---

## Description


## Todo
- [ ]

## ETC
12 changes: 12 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
## Overview
-

## Change Log
-

## To Reviewer
-

## Issue Tags
- Closed | Fixed: #
- See also: #
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
output
wandb
__pycache__
saved
submit
log
log_tensorboard
data
Binary file not shown.
136 changes: 136 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Movie Recommendation

## OverView
본 프로젝트는 'Movie Recommendation'으로, 사용자의 영화 시청 이력 데이터를 토대로 사용자가 다음에 시청할 영화 및 좋아할 영화를 예측하는 기존 대회들과는 다르게 사용자의 영화 시청 이력에서 10개를 랜덤으로 제거한 후 제거된 시청 이력이 무엇인지를 맞춰야 하는 대회이다.

모든 사용자에게 10개의 영화를 추천하며, 이때 추천 리스트의 정확성(Recall@10)을 평가 기준으로 삼는다.

평가를 위한 정답(ground-truth) 데이터는 Sequential Recommendation 시나리오를 바탕으로 사용자의 Time-Ordered Sequence에서 일부 Item이 누락(dropout)된 상황을 상정한다
<br><br>

## Component

### 프로젝트 디렉토리 구조
```
📦level2-movierecommendation-recsys-03-main
┗ 📂code
┣ 📂EASER
┣ 📂EDA
┣ 📂bert4rec
┣ 📂custom
┣ 📂multi
┣ 📂recbole
┃ ┗ 📂configs - .yaml
┃ ┣ ADMMSLIM, DIFFRec, EASE, deepfM
┃ ┣ fm, lightgcn, ract, recvae,
┃ ┗ seq, slim
┣ 📂s3rec
┗ 📂voting
```
### 데이터셋 구조
```
📦level2-movierecommendation-recsys-03-main
┗ 📂train
┣ 📜Ml_item2attributes.json
┣ 📜directors.tsv
┣ 📜genres.tsv
┣ 📜titles.tsv
┣ 📜train_ratings.csv
┣ 📜writers.tsv
┗ 📜years.tsv
```


## Team
<br>
<table align="left">
<tr height="155px">
<td align="center" width="150px">
<a href="https://github.com/ksb3966"><img src="https://github.com/ksb3966.png" width="100px;" alt=""/></a>
</td>
<td align="center" width="150px">
<a href="https://github.com/SiwooPark00"><img src="https://github.com/SiwooPark00.png" width="100px;" alt=""/></a>
</td>
<td align="center" width="150px">
<a href="https://github.com/arctic890"><img src="https://github.com/arctic890.png" width="100px;" alt=""/></a>
</td>
<td align="center" width="150px">
<a href="https://github.com/JaeGwon-Lee"><img src="https://github.com/JaeGwon-Lee.png" width="100px;" alt=""/></a>
</td>
<td align="center" width="150px">
<a href="https://github.com/jinmin111"><img src="https://github.com/jinmin111.png" width="100px;" alt=""/></a>
</td>
<td align="center" width="150px">
<a href="https://github.com/chris3427"><img src="https://github.com/chris3427.png" width="100px;" alt=""/></a>
</td>
</tr>
<tr height="80px">
<td align="center" width="150px">
<a href="https://github.com/ksb3966">김수빈_T6021</a>
</td>
<td align="center" width="150px">
<a href="https://github.com/SiwooPark00">박시우_T6060</a>
</td>
<td align="center" width="150px">
<a href="https://github.com/arctic890">백승빈_T6075</a>
</td>
<td align="center" width="150px">
<a href="https://github.com/JaeGwon-Lee">이재권_T6131</a>
</td>
<td align="center" width="150px">
<a href="https://github.com/jinmin111">이진민_T6139</a>
</td>
<td align="center" width="150px">
<a href="https://github.com/chris3427">장재원_T6149</a>
</td>
</tr>
</table>
&nbsp;
<br>

## Role

| 이름 | 역할 |
| --- | --- |
| 김수빈 | EDA, 모델 선정 및 튜닝, EASER, SASRec, Bert4Rec 실험 수행 |
| 박시우 | EDA, s3rec baseline 정리, recbole baseline 구축, RecVAE, ADMMSLIM, soft voting 앙상블 구현 |
| 백승빈 | EDA, EASE, SLIMElastic, DiffRec 모델 실험 및 튜닝, hard voting기반 앙상블 구현 |
| 이재권 | EDA, LightGCN 모델 실험 및 튜닝 |
| 이진민 | EDA, Multi-DAE, Multi-VAE 코드 모듈화 및 실험 |
| 장재원 | EDA, Sequential models, Type based model, 모델들 성능 비교 및 분석 |
<br>

## Experiment Result

### Single Model Result
| | Public Recall@10 | Private Recall@10 |
| --- | --- | --- |
| Popular item rule based model | 0.0673 | 0.0671 |
| Genre rule based model | 0.0619 | 0.0626 |
| Type based model | 0.0687 | 0.0696 |
| GRU4Rec | 0.0970 | 0.0809 |
| SASRec | 0.0884 | 0.0833 |
| S3Rec(Pretrained) | 0.0829 | 0.0743 |
| BERT4Rec | 0.0687 | 0.0676 |
| LightGCN | 0.1302 | 0.1316 |
| DiffRec | 0.1413 | 0.1431 |
| RecVAE | 0.1349 | 0.1362 |
| Multi-VAE | 0.1394 | 0.1377 |
| Multi-DAE (with side information) | 0.1427 | 0.1413 |
| EASE | 0.1566 | 0.1565 |
| ADMMSLIM | 0.1524 | 0.1541 |
| SLIMElastic | 0.1562 | 0.1562 |
| EASER | 0.1612 | 0.1603 |

### Ensemble Result
| | Private Recall@10 | Public Recall@10 |
| --- | --- | --- |
| EASER, 앙상블 모델*에 type별 추천 적용 | 0.1614 | 0.1605 |
| 앙상블 모델*(EASE, ADMMSLIM, RecVAE) | 0.1613 | 0.1611 |

최종적으로는 가장 높은 Public score 값에 해당하는 두 결과 값을 제출하였다.
<br><br>

## Wrap-Up Report
[MovieRec Wrap-up Report - Suggestify.pdf](./Movie_Recommendation_Wrap-up_Report-Suggestify.pdf)
<br>
77 changes: 77 additions & 0 deletions code/EASER/dataloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import pandas as pd
import os
from pytz import timezone
from train_args import parse_args
from collections import defaultdict
import numpy as np

import warnings
warnings.filterwarnings(action="ignore")

class Dataloader():
def __init__(self, config):
self.config = config
self.df = pd.read_csv(os.path.join(self.config.data_path, 'train_ratings.csv'))

self.users = self.df.user.unique()
self.items = self.df.item.unique()

self.item_encoder, self.item_decoder = self.generate_encoder_decoder('item')
self.user_encoder, self.user_decoder = self.generate_encoder_decoder('user')
self.num_item, self.num_user = len(self.item_encoder), len(self.user_encoder)

self.df['item_idx'] = self.df['item'].apply(lambda x : self.item_encoder[x])
self.df['user_idx'] = self.df['user'].apply(lambda x : self.user_encoder[x])

self.user_train, self.user_valid = self.generate_sequence_data()

def dataloader(self):
return self.df, self.users, self.items

def generate_encoder_decoder(self, col : str) -> dict:
"""
encoder, decoder 생성

Args:
col (str): 생성할 columns 명
Returns:
dict: 생성된 user encoder, decoder
"""

encoder = {}
decoder = {}
ids = self.df[col].unique()

for idx, _id in enumerate(ids):
encoder[_id] = idx
decoder[idx] = _id

return encoder, decoder

def generate_sequence_data(self) -> dict:
"""
sequence_data 생성

Returns:
dict: train user sequence / valid user sequence
"""
users = defaultdict(list)
user_train = {}
user_valid = {}
for user, item, time in zip(self.df['user_idx'], self.df['item_idx'], self.df['time']):
users[user].append(item)

for user in users:
np.random.seed(self.config.seed)

user_total = users[user]
valid = np.random.choice(user_total, size = self.config.valid_samples, replace = False).tolist()
train = list(set(user_total) - set(valid))

user_train[user] = train
user_valid[user] = valid # valid_samples 개수 만큼 검증에 활용 (현재 Task와 가장 유사하게)

return user_train, user_valid

def get_train_valid_data(self):
return self.user_train, self.user_valid
Loading