Site Parser

This is a test parser for extracting newses and saving them to a CSV file using the library Selectolax.

Features

Choosing which page to which page:
You can choose which page to which data will be collected in the .env file.
Asynchronous translation into Russian:
The parser is also capable of synchronously translating all incoming text with p and li tags.
Further improvements:
In the future, parsing options for other sites will be added, as well as the ability to parse from many sites at the same time.

Clone the repository:

git clone https://github.com/yourusername/Site_parser.git
cd telegram-parser

Install Poetry:

(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
poetry --version

Install Project Dependencies:
```
poetry install
```
Set Up Environment Variables: The project uses environment variables to store information like start/end page. You need to create a .env file at the root of the project directory.
- 4.1. Create a .env file:
```
touch .env
```
- 4.2. Add the page numbers you want to parse, as well as the information you need (example below):
```
START_PAGE=1
END_PAGE=3
NEWS_CATEGORY=economy-trade
```
Running the Code:
- 5.1. Activate the virtual environment created by Poetry:
```
poetry shell
```
- 5.2. Run the script to start fetching messages from Telegram:
```
poetry run python src/main.py
```
Output:
- 6.1 Example output:
```
The data from the page 1 is collected
The data from the page 2 is collected
```
After running the script, the extracted messages will be saved in a CSV file located at data/data/results.csv. The file will include details like title, news date, news href, news short text, news main text, country and category.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml