Skip to content

sarit/raw-ocr-data

Repository files navigation

Raw double-keyboarding data for Sanskrit texts

This repository contains the raw double-keyboarding data for projects undertaken at Columbia University between 2012 and 2015. In those projects, TEI markup was applied to this data, and as the texts were completed, they were moved to the SARIT-corpus repository, the authoritative repository for texts produced in the SARIT project.

Scripts

Each raw file is available in the Devanagari script (in which it was originally typed up), as well as automatically-produced transliterations in the IAST and ISO-15919 romanization schemes.

Markup

Most of the raw files include some markup that was introduced during double-keyboarding.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors