Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
3012907
chore: add another entry for the same year
Firgrep Jun 30, 2025
bc3656e
change: regex matching to include keys
Firgrep Jun 30, 2025
69c6f82
fix: regex
Firgrep Jun 30, 2025
74d31ab
fix: regex2
Firgrep Jun 30, 2025
a2a2196
fix: regex3
Firgrep Jun 30, 2025
df394c9
chore: add single citation key test
Firgrep Jun 30, 2025
bc381b1
chore: add more tests for citation key
Firgrep Jun 30, 2025
988b0a2
chore: clean up comment
Firgrep Jun 30, 2025
b0985ce
change: update bibliography matching to support key
Firgrep Jun 30, 2025
3582da9
chore: expand text doc with citation key examples
Firgrep Jun 30, 2025
c703907
fix: verification for citation keys
Firgrep Jun 30, 2025
08c5332
chore: formatting
Firgrep Jun 30, 2025
c867252
feat: disambiguiation logic
Firgrep Jun 30, 2025
49c72af
chore: fix articles bibliographical entry and make it use disambiguat…
Firgrep Jul 1, 2025
7ca646f
chore: period and space after author date in bibliography
Firgrep Jul 1, 2025
1ae4e2a
fix: handle proper author and year disambiguation in biblio gen
Firgrep Jul 1, 2025
e5d6f7b
feat: transform keys to citations in article
Firgrep Jul 1, 2025
e4df230
fix: format authors last name only for disambiguated author date
Firgrep Jul 1, 2025
1eaf602
test: add integration test cases for disambiguation
Firgrep Jul 1, 2025
b24c2e5
fix: sorting logic to be disambiguation aware
Firgrep Jul 1, 2025
22ce315
chore: todos
Firgrep Jul 1, 2025
564d8e4
feat: check for ambiguous citations and add custom errors
Firgrep Jul 1, 2025
67ffa95
feat: add tests for ambiguous citations
Firgrep Jul 1, 2025
d03c074
chore: cleanup and todos
Firgrep Jul 1, 2025
7cf3072
chore: tweak ci activation
Firgrep Jul 1, 2025
9742193
chore: prevent ci from running in draft
Firgrep Jul 1, 2025
207b058
chore: tests for mixed keys and citations
Firgrep Jul 2, 2025
4bda669
chore: fix mock with key
Firgrep Jul 2, 2025
f9aa7d4
chore: comment addition
Firgrep Jul 2, 2025
b869c67
fix: race conditions with tests
Firgrep Jul 2, 2025
c0a59f1
docs: update docs for 0.4
Firgrep Jul 2, 2025
7f1c870
chore: bump minor
Firgrep Jul 2, 2025
4dfda36
fix: test
Firgrep Jul 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
name: Continuous integration
on: [push, pull_request]
name: Continuous Integration
on:
workflow_dispatch:
pull_request:
branches: ["main"]
types: [opened, synchronize, ready_for_review]

jobs:
ci:
if: ${{ github.event.pull_request.draft == false }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions-rust-lang/setup-rust-toolchain@v1
- run: cargo build
- run: cargo test
- run: cargo test
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ repository = "https://github.com/systemphil/prepyrus"
readme = "README.md"
categories = ["database", "parser-implementations", "text-processing"]
keywords = ["bibtex", "biblatex", "mdx", "parser", "citation"]
version = "0.3.1"
version = "0.4.0"
edition = "2021"

[dependencies]
Expand Down
56 changes: 39 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ that contain citations in Chicago author-date style and certain metadata.

## Usage

Run `cargo add prepyrus` or add the crate to your `Cargo.toml`:
Add the crate to your `Cargo.toml` and use it as shown below:

```toml
[dependencies]
Expand All @@ -20,27 +20,33 @@ prepyrus = "<latest_version>"
Main API interface is the `Prepyrus` impl. Example usage:

```rust
use prepyrus::Prepyrus;
use prepyrus::{
cli::{Cli, Mode},
Prepyrus
};

fn main() {
let args = vec![
"_program_index".to_string(),
"tests/mocks/test.bib".to_string(), // bibliography file
"tests/mocks/data".to_string(), // target directory or .mdx file
"verify".to_string(), // mode
"tests/mocks/data/development.mdx".to_string(), // optional ignore paths, separate with commas if multiple
];

let _ = run(args).unwrap_or_else(|e| {
let _ = run().unwrap_or_else(|e| {
eprintln!("Error: {}", e);
std::process::exit(1);
});

println!("Prepyrus completed successfully!");
}

fn run(args: Vec<String>) -> Result<(), Box<dyn std::error::Error>> {
let config = Prepyrus::build_config(&args, None)?;
fn run() -> Result<(), Box<dyn std::error::Error>> {
// Example Command Line Inputs
let cli = Cli {
bib_file: "tests/mocks/test.bib".to_string(),
target_path: "tests/mocks/data-isolated".to_string(),
mode: Mode::Verify,
ignore_paths: Some(vec!["tests/mocks/data/development.mdx".into()]),
generate_index_to_file: None,
index_link_prefix_rewrite: None,
};
// Normally one would use let cli = Prepyrus::parse_cli();

let config = Prepyrus::build_config(cli, None)?;
let all_entries = Prepyrus::get_all_bib_entries(&config.bib_file).unwrap();
let mdx_paths =
Prepyrus::get_mdx_paths(&config.target_path, Some(config.settings.ignore_paths))?;
Expand All @@ -49,7 +55,7 @@ fn run(args: Vec<String>) -> Result<(), Box<dyn std::error::Error>> {
let articles_file_data = Prepyrus::verify(mdx_paths, &all_entries)?;

// Phase 2: Process MDX files (requires mode to be set to "process")
if config.mode == "process" {
if config.mode == Mode::Process {
Prepyrus::process(articles_file_data);
}

Expand All @@ -65,17 +71,23 @@ fn run(args: Vec<String>) -> Result<(), Box<dyn std::error::Error>> {

## Description

The tool is designed to work with MDX files that contain citations in Chicago author-date style. Examples:
The tool is designed to work with MDX files that contain citations in Chicago author-date style or by BibTex key. Examples:

> "...nowhere on heaven or on earth is there anything which does not contain both being and nothing in itself" (Hegel 2010, 61).

> "The equilibrium in which coming-to-be and ceasing-to-be are poised is in the first place becoming itself" (@hegel2010logic, 81).

> "Existence proceeds from becoming" (see Hegel 2010, 61).

The tool parses and verifies the citations in the MDX files against a
bibliography file in BibTeX format (using Biblatex).
If the citations are valid, the tool processes the MDX files
by adding a bibliography section at the end of the file.
It also adds author, editor, and contributor from the MDX file metadata if available.
Finally, it also adds a notes heading at the end if footnotes are present in the file.

If BibTex keys are used, these will be replaced by disambiguated citations during `process` mode.

## Additional Features

**Alphabetical Index Generation**
Expand All @@ -93,11 +105,21 @@ You can rewrite parts of generated index links using:
--link-prefix-rewrite "/content=/articles"
```

**Handling Ambiguities**

Version `0.4` introduces citation ambiguity handling. When an author has multiple
works in the same year, such as (Hegel 1991) which might refer to the Miller
translation of the Science of Logic or the Encyclopaedia Logic, the program will
return an error with disambiguation suggestions by key. To solve ambiguous citations,
one must make use of BibTex keys prefixed with @ in the citation, e.g. `(@hegel1991logic)`.

During `process` mode, keys will be converted to disambiguated citations in Chicago author-date style.

## Limitations

The tool currently only supports citations in Chicago author-date style.
Only book and article entries are currently supported (plans to support more types in the future).
Only the following metadata fields from the target `.mdx` files are supported:
Only book entries are currently supported (plans to support more types in the future).
Only the following metadata fields are supported:

- author
- editor
Expand Down
26 changes: 26 additions & 0 deletions src/errors.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
use std::fmt;

/// Validation errors when parsing contents of a file.
#[derive(Debug)]
pub enum CitationError {
/// Two or more possible matches to a single citation. Requires disambiguation through unique key rather than inline citation style.
AmbiguousMatch(String),

/// Citations that did not find a match in the source `.bib` bibliography.
UnmatchedCitations(Vec<String>),
}

impl fmt::Display for CitationError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
CitationError::AmbiguousMatch(details) => {
write!(f, "Ambiguous citations found:\n{}", details)
}
CitationError::UnmatchedCitations(citations) => {
write!(f, "Citations not found in the library: {:?}", citations)
}
}
}
}

impl std::error::Error for CitationError {}
12 changes: 7 additions & 5 deletions src/inserters.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
use biblatex::Entry;
use itertools::Itertools;
use regex::Regex;
use std::collections::BTreeSet;
use std::fs::{self, OpenOptions};
use std::io::{self, Write};
use validators::{ArticleFileData, Metadata};

use crate::transformers::transform_keys_to_citations;
use crate::validators::MatchedCitationDisambiguated;
use crate::{transformers, validators};

struct InserterOutcome {
Expand Down Expand Up @@ -114,7 +115,7 @@ pub fn generate_index_to_file(

fn process_mdx_file(article_file_data: ArticleFileData, inserter_outcome: &mut InserterOutcome) {
let mut mdx_payload = String::new();
let mdx_bibliography = generate_mdx_bibliography(article_file_data.matched_citations);
let mdx_bibliography = generate_mdx_bibliography(&article_file_data.entries_disambiguated);

let mdx_authors = generate_mdx_authors(&article_file_data.metadata);
let mdx_notes_heading = generate_notes_heading(&article_file_data.markdown_content);
Expand All @@ -136,8 +137,9 @@ fn process_mdx_file(article_file_data: ArticleFileData, inserter_outcome: &mut I
return;
}

let updated_markdown_content =
format!("{}\n{}", article_file_data.full_file_content, mdx_payload);
let full_file_content_disambiguated = transform_keys_to_citations(&article_file_data);

let updated_markdown_content = format!("{}\n{}", full_file_content_disambiguated, mdx_payload);

match write_html_to_mdx_file(&article_file_data.path, &updated_markdown_content) {
Ok(_) => {
Expand Down Expand Up @@ -165,7 +167,7 @@ fn append_to_file(path: &str, content: &str) -> std::io::Result<()> {
Ok(())
}

fn generate_mdx_bibliography(entries: Vec<Entry>) -> String {
fn generate_mdx_bibliography(entries: &Vec<MatchedCitationDisambiguated>) -> String {
let mut bib_html = String::new();

if entries.is_empty() {
Expand Down
41 changes: 38 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Add the crate to your `Cargo.toml` and use it as shown below:

```toml
[dependencies]
prepyrus = "0.2"
prepyrus = "<latest_version>"
```

Main API interface is the `Prepyrus` impl. Example usage:
Expand All @@ -34,12 +34,13 @@ fn run() -> Result<(), Box<dyn std::error::Error>> {
// Example Command Line Inputs
let cli = Cli {
bib_file: "tests/mocks/test.bib".to_string(),
target_path: "tests/mocks/data".to_string(),
target_path: "tests/mocks/data-isolated".to_string(),
mode: Mode::Verify,
ignore_paths: Some(vec!["tests/mocks/data/development.mdx".into()]),
generate_index_to_file: None,
index_link_prefix_rewrite: None,
};
// Normally one would use let cli = Prepyrus::parse_cli();

let config = Prepyrus::build_config(cli, None)?;
let all_entries = Prepyrus::get_all_bib_entries(&config.bib_file).unwrap();
Expand All @@ -66,17 +67,50 @@ fn run() -> Result<(), Box<dyn std::error::Error>> {

## Description

The tool is designed to work with MDX files that contain citations in Chicago author-date style. Examples:
The tool is designed to work with MDX files that contain citations in Chicago author-date style or by BibTex key. Examples:

> "...nowhere on heaven or on earth is there anything which does not contain both being and nothing in itself" (Hegel 2010, 61).

> "The equilibrium in which coming-to-be and ceasing-to-be are poised is in the first place becoming itself" (@hegel2010logic, 81).

> "Existence proceeds from becoming" (see Hegel 2010, 61).

The tool parses and verifies the citations in the MDX files against a
bibliography file in BibTeX format (using Biblatex).
If the citations are valid, the tool processes the MDX files
by adding a bibliography section at the end of the file.
It also adds author, editor, and contributor from the MDX file metadata if available.
Finally, it also adds a notes heading at the end if footnotes are present in the file.

If BibTex keys are used, these will be replaced by disambiguated citations during `process` mode.

## Additional Features

**Alphabetical Index Generation**

When running in process mode with the `--generate-index-file <TARET_FILE>` option, Prepyrus now:

- Extracts all `indexTitles` from .mdx files.
- Sorts them alphabetically by title.
- Groups them under ## headings by first letter (e.g., ## A, ## B, etc).
- Writes a neatly structured index to the specified .mdx file.

You can rewrite parts of generated index links using:

```txt
--link-prefix-rewrite "/content=/articles"
```

**Handling Ambiguities**

Version `0.4` introduces citation ambiguity handling. When an author has multiple
works in the same year, such as (Hegel 1991) which might refer to the Miller
translation of the Science of Logic or the Encyclopaedia Logic, the program will
return an error with disambiguation suggestions by key. To solve ambiguous citations,
one must make use of BibTex keys prefixed with @ in the citation, e.g. `(@hegel1991logic)`.

During `process` mode, keys will be converted to disambiguated citations in Chicago author-date style.

## Limitations

The tool currently only supports citations in Chicago author-date style.
Expand All @@ -101,6 +135,7 @@ Apache-2.0
*/

pub mod cli;
pub mod errors;
pub mod inserters;
pub mod transformers;
pub mod utils;
Expand Down
Loading