Skip to content
This repository has been archived by the owner on Apr 30, 2024. It is now read-only.

Commit

Permalink
Merge pull request #25 from ericpan64/update-html-and-css
Browse files Browse the repository at this point in the history
DuGuo 0.2.0 - Frontend Refactor
  • Loading branch information
Eric Pan authored Dec 31, 2021
2 parents 4218106 + 598d578 commit c35ebe0
Show file tree
Hide file tree
Showing 85 changed files with 23,195 additions and 1,191 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ app/temp
app/src/config.rs
data_services/config.py
app/Cargo.lock
app/templates/static/docs
app/static/docs
19 changes: 12 additions & 7 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
Begin license text.
Copyright 2021 Eric Pan
DuGuo Chinese Reading App
Copyright (C) 2022 Eric Pan

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

End license text.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
62 changes: 35 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,59 @@
# DuGuo
[![docs: 0.1.0](https://img.shields.io/badge/Docs-0.1.0-blue)](https://duguo-app.com/static/doc/duguo/index.html)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
## Overview
[![docs: 0.2.0](https://img.shields.io/badge/Docs-0.2.0-blue)](https://duguo.app/static/doc/duguo/index.html)
[![License: AGPL](https://img.shields.io/badge/License-AGPL-yellow.svg)](https://www.gnu.org/licenses/agpl-3.0.en.html)

DuGuo is a web application that allows users to read Chinese text in an interactive learning environment including pinyin support, speech-to-text, and a lookup dictionary. Building from existing solutions, DuGuo aims to provide the best UX for contextual learning while remaining open-source. This app is designed in particular for L2 (second-language) learners, though hopefully it is useful for all levels of Chinese learning!
## Overview
DuGuo is an open-source web application that allows users to read Chinese text in an interactive learning environment. The main features include:
- Phonetic support (Pinyin + Zhuyin) and phrase lookup via [CC-CEDICT](https://cc-cedict.org/wiki/)
- Phrase tokenization via [spaCy](https://spacy.io/)
- Text-to-speech via the [SpeechSynthesis API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis)
- Transposition between Simplified + Traditional Chinese text
- ... other ideas tbd - view + contribute in the [Issues](https://github.com/ericpan64/DuGuo-Chinese-Reading-App/issues) tab!

### Deployment
This app is designed in particular for L2 (second-language) learners, though hopefully it is useful for all levels of Chinese learning!

The app is currently available at [duguo.app](https://duguo.app) (redirects to [duguo-app.com](https://duguo-app.com)). The production deployment is hosted on AWS. This repository contains all code and configuration to run an instance locally using docker-compose (from the root directory, run `docker-compose up`).
Check it out at [duguo.app](https://duguo.app)! "Production" is hosted on GCP while this repository contains all code and configuration to run an instance locally using docker-compose (from the root directory, run `docker-compose up`).

### Tech Stack
The app has 2 microservices:
1. A web server written in Rust using [Rocket](https://rocket.rs/)
2. An NLP tokenization service written in Python primarily using [spaCy's Chinese module](https://spacy.io/models/zh) (which builds on top of [jieba](https://github.com/fxsjy/jieba))
- [OpenCC](https://github.com/BYVoid/OpenCC) and [pypinyin](https://github.com/mozillazg/python-pinyin) are used during processing.

The app is made of 3 components:
1. A web server written in Rust using [Rocket](https://rocket.rs/) and other assorted libraries (see the [Cargo.toml](app/Cargo.toml) file).
2. An NLP tokenization service written in Python using [spaCy's Chinese module](https://spacy.io/models/zh) (which builds on top of [jieba](https://github.com/fxsjy/jieba)). [OpenCC](https://github.com/BYVoid/OpenCC) and [pypinyin](https://github.com/mozillazg/python-pinyin) are used during processing.
3. Data persistance via a database ([mongoDB](https://www.mongodb.com/)) and a cache ([Redis](https://redis.io/)).
For data persistance, [mongoDB](https://www.mongodb.com/) and [Redis](https://redis.io/) are used.

Tokenized words are looked-up in the [CC-CEDICT](https://cc-cedict.org/wiki/) which is generously available for use under a Creative Commons license. Radical information (for saved vocab) is sourced from [this web API](http://ccdb.hemiola.com/) and can be quickly accessed using the accompanying [Hemiola Chinese Character Browser](http://hanzi.hemiola.com/).
Tokenized words are looked-up in the [CC-CEDICT](https://cc-cedict.org/wiki/) which is generously available under a Creative Commons license. Radical information (for saved vocab) is sourced from [this web API](http://ccdb.hemiola.com/) and can be quickly accessed using the accompanying [Hemiola Chinese Character Browser](http://hanzi.hemiola.com/).

## Motivation
Learning Chinese as a second language is hard for many reasons. To start, Chinese characters are logographic whereas English characters are alphabetic - this necessitates a fundamentally different approach to phrase memorization. Additionally, phrase pronunciation requires learning technical phonetic syntax (e.g. pinyin) which is rarely used by natives and virtually non-existant in practice.

Learning Chinese as a second language is hard for many reasons: it is logographic (whereas English is alphabetic) which necessitates extensive memorization, and starting out requires learning technical phonetic syntax (pinyin) which is quickly deprecated and virtually non-existant in practice.

While there are many more nuanced approaches to Chinese learning (e.g. the [HSK framework](https://en.wikipedia.org/wiki/Hanyu_Shuiping_Kaoshi)), in my simple opinion there are really 3 "tiers" that need to be mastered for Chinese reading:

1. All-pinyin (for absolute beginners)
2. Some-pinyin (roughly grade-school level for native Chinese speakers)
3. No-pinyin (adult level)
While there are many more nuanced approaches to Chinese learning (e.g. the [HSK framework](https://en.wikipedia.org/wiki/Hanyu_Shuiping_Kaoshi)), one simplified view is that there are 3 levels of Chinese reading mastery:
1. Almost entirely pinyin-dependent (for beginners and L2 learners that can speak but can't read, like myself...)
2. Some pinyin needed (roughly grade-school level for native Chinese speakers)
3. Almost no pinyin needed (adult level - phrases are either memorized or able to be intuited based on the context)

Below are images to provide a visual reference. While for natives the jump from tier 1 to 3 is trivial, for L2 learners it can feel insurmountable!

[<img src="design/images/textbook-beginner.jpg" alt="A beginner-level Chinese textbook with pinyin included for all words ('Tier 1')." width="250">](design/images/textbook-beginner.jpg)
[<img src="design/images/textbook-intermediate.jpg" alt="An intermediate-level Chinese textbook with pinyin for some words ('Tier 2'). In practice, this is grade-school level for natives!" width="250">](design/images/textbook-intermediate.jpg)
[<img src="design/images/newspaper-hard.jpg" alt="A native-level article from a Chinese newspaper ('Tier 3'). No pinyin is used at all, since natives don't really need it!" width="325">](design/images/newspaper-hard.jpg)
1. [<img src="design/images/textbook-beginner.jpg" alt="A beginner-level Chinese textbook with pinyin included for all words ('Tier 1')." width="350">](design/images/textbook-beginner.jpg)
2. [<img src="design/images/textbook-intermediate.jpg" alt="An intermediate-level Chinese textbook with pinyin for some words ('Tier 2'). In practice, this is grade-school level for natives!" width="350">](design/images/textbook-intermediate.jpg)
3. [<img src="design/images/newspaper-hard.jpg" alt="A native-level article from a Chinese newspaper ('Tier 3'). No pinyin is used at all, since natives don't really need it!" width="350">](design/images/newspaper-hard.jpg)

### Contextual Learning

Contextual learning is arguably the best way to learn a language. People remember things that are linked to experiences or assorted significant pieces of information. For natives, learning Chinese is essential. However for L2 learners, finding the urgency to learn is uniquely difficult without an external driving force (e.g. living in a Chinese-speaking country).

Barring the ability to live in a foreign country, DuGuo hopes to offer the next-best thing by allowing users to pick what they want to read (improving contextual relevance) and saving contextual references for "learned" phrases (adding contextual triggers).

## Existing Tools

## Other Existing Tools
There are several existing tools that provide similar functionality, including (but not limited to): [Zhongwen Chrome Extension](https://chrome.google.com/webstore/detail/zhongwen-chinese-english/kkmlkkjojmombglmlpbpapmhcaljjkde?hl=en), [Purple Culture Pinyin Converter](https://www.purpleculture.net/chinese-pinyin-converter/), [Du Chinese (mobile)](https://www.duchinese.net/), [mdbg.net](https://www.mdbg.net/chinese/dictionary), [Hànzì Analyzer](http://hemiola.com/), [pin1yin1](https://www.pin1yin1.com/), etc.

The main differentiator I hope to provide with this project is improved UX (pinyin toggling, contextual saving) and the ability to persist documents to a database (allows building a long-term knowledge base). Ultimately this is provided as an additional tool to help users learn Chinese, so definitely use the combination of tools that best supplements your learning experience.
The main differentiators DuGuo hopes to provide with this project are improved UX, progress persistance (via accounts), document difficulty scoring (in progress), and [Duey](app/static/img/duey/duey_extra_happy.png)! Ultimately this is provided as an additional tool to help users learn Chinese, so definitely use the combination of tools that best supplements your learning experience.

## Acknowledgement

## Acknowledgements
This project was adopted from Martin Kess's previous CS6460 final project, the Chinese Reading Machine (中文读机). He provided the starter code (in Python Flask) and a strong existing framework to build on. The images for Duey came from Dzaky Taufik (his Upwork linked [here](https://www.upwork.com/freelancers/~013f8e6de5a2a64421)). 感谢 and 大家加油!

This project was adopted from Martin Kess's previous CS6460 final project, the Chinese Reading Machine (中文读机). He provided the starter code (in Python Flask) and a strong existing framework to build on. 感谢!
[<img src="app/static/img/duey/duey_base_normal.png" alt="Duey!" width="150">](app/static/img/duey/duey_base_normal.png)
[<img src="app/static/img/duey/duey_base_confused.png" alt="Confused Duey?" width="150">](app/static/img/duey/duey_base_confused.png)
[<img src="app/static/img/duey/duey_base_surprised.png" alt="Surprised Duey" width="150">](app/static/img/duey/duey_base_surprised.png)
[<img src="app/static/img/duey/duey_base_worried.png" alt="Worried Duey :-(" width="150">](app/static/img/duey/duey_base_worried.png)
[<img src="app/static/img/duey/duey_base_happy.png" alt="Happy Duey!" width="150">](app/static/img/duey/duey_base_happy.png)
7 changes: 3 additions & 4 deletions app/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "duguo"
description = "A Rocket web-app designed to facilitate learning how to read Chinese."
repository = "https://github.com/ericpan64/DuGuo-Chinese-Reading-App"
homepage = "https://duguo-app.com"
version = "0.1.0"
version = "0.2.0"
authors = ["epan"]
edition = "2018"
license = "MIT"
Expand All @@ -18,15 +18,14 @@ hex = "0.3.1"
serde = "1.0.118"
jsonwebtoken = "7.2.0"
chrono = "0.4.19"
reqwest = { version = "0.10.2", features = ["blocking"] }
reqwest = { version = "0.10.2", features = ["json"] }
scraper = "0.12.0"
redis = { version = "0.17.0", features = ["tokio-comp"] }
regex = "1"
rand = "0.8.0"
itertools = "0.10.0"

# Docs: https://api.rocket.rs/v0.4/rocket_contrib/
[dependencies.rocket_contrib]
version = "0.4.6"
default-features = false
features = ["tera_templates", "serve", "json"]
features = ["serve", "json", "tera_templates"]
7 changes: 6 additions & 1 deletion app/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
FROM rustlang/rust:nightly
FROM rustlang/rust:nightly-slim
WORKDIR /app
COPY . .
EXPOSE 8000
RUN apt-get update
RUN apt-get install -y\
libssl-dev\
pkg-config
RUN cargo clean

# Testing
RUN cargo build
Expand Down
Loading

0 comments on commit c35ebe0

Please sign in to comment.