For Better Essay 📚

An AI-Driven Program to Improve Korean Literacy for Students

Empowering literacy through machine summarization and the utilization of a comprehensive Korean book corpus.

This document is a translated version of the original Korean README.

Awarded First Prize in Competition (Competition Award_ref.pdf)

Result

Description

This project addresses the growing concern about literacy decline among younger generations in Korea. It targets middle and high school students, providing a web service where users can practice document summarization.

Project Procedure

Scenario Design

From the Korean book corpus, reading passages of specific lengths are provided. After generating machine summaries using the AI model, users can input their summaries and compare them with the machine-generated ones. The system evaluates the accuracy and returns the results categorized as follows:

Perfect : Score ≥ 0.8
Great : 0.6 ≤ Score < 0.8
Good : 0.4 ≤ Score < 0.6
Try again : Score < 0.4

Data Building and Preprocessing

Refine required data from JSON files and organize it into CSV format.
Convert data structured as sentences into paragraph forms.
Add columns to facilitate services considering difficulty levels.
Merge corpus data based on specific categories.
Optimize the data into a database format for faster service execution.

Pretraining - ET5 model

: The project employs ET5, a Korean language model developed by ETRI, which enhances language understanding and generation capabilities through simultaneous pretraining on black-filling (T5 training type) and next-word prediction (GPT training type).

The ET5 model was obtained by completing a usage agreement with ETRI via their data application page.

Fine Tuning

: The pretrained ET5 model was fine-tuned using text summarization datasets from AI Hub. These datasets include 200,000 paragraphs (300-1,000 characters each) along with their corresponding summaries, allowing ET5 to generate summaries aligned with the characteristics of the given documents.

Evaluation : BERT SCORE

: The project empolys BERT score to evaluate machine summarization instead of commonly used ROUGE score, as ROUGE score evaluates exact matches but fails to assess contextual nuances.

Mechanism : Calculates cosine similarity between contextual embeddings of reference sentences and machine-generated sentences, then selects the most similar vectors through greedy matching to produce an F1 score.
BERT score for our model : 0.8391

Web service Implementation

< Main >

Users can select one of four topics or opt for the "실력UP"(Skill Up) advanced version.

< Topic >

Offers a broader range of themes compared to the main version.

< Summary >

Users read paragraphs and write their summaries for evaluation.

Features:
- Four topics and an advanced writing mode are available.
- Topic-based paragraphs are under 600 characters, while advanced writing involves random topics over 600 characters.
- Users can measure study tiem using a timer.
- Summaries (60-200 characters) are evaluated by comparing them with machine-generated summaries, displaying a score.

< User & Settings >

Allows users to manage personal information and adjust web service settings.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
flaskapp		flaskapp
How_to_run.txt		How_to_run.txt
Modeling_code.ipynb		Modeling_code.ipynb
Presentation.pdf		Presentation.pdf
README.md		README.md
_Planing.md		_Planing.md
_video.md		_video.md
helloflask.py		helloflask.py
requirements.txt		requirements.txt
run_program.ipynb		run_program.ipynb
text_1.py		text_1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

For Better Essay 📚

An AI-Driven Program to Improve Korean Literacy for Students

Empowering literacy through machine summarization and the utilization of a comprehensive Korean book corpus.

Awarded First Prize in Competition (Competition Award_ref.pdf)

Result

Description

Project Procedure

Scenario Design

Data Building and Preprocessing

Pretraining - ET5 model

Fine Tuning

Evaluation : BERT SCORE

Web service Implementation

< Main >

< Topic >

< Summary >

< User & Settings >

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

For Better Essay 📚

An AI-Driven Program to Improve Korean Literacy for Students

Empowering literacy through machine summarization and the utilization of a comprehensive Korean book corpus.

Awarded First Prize in Competition (Competition Award_ref.pdf)

Result

Description

Project Procedure

Scenario Design

Data Building and Preprocessing

Pretraining - ET5 model

Fine Tuning

Evaluation : BERT SCORE

Web service Implementation

< Main >

< Topic >

< Summary >

< User & Settings >

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages