Technology & Tools
The tools and technology used by Newbook Digital Texts in the Humanities make-up a "clear path to output" process for texts prepared in a modified version of the Test Encoding Initiative (TEI) document encoding type. The goal of this process is to produce well-formed, valid, structured data from literary, historical, pedagogical, and other sources in multiple languages and scripts not readily available in print.
NDTH uses a set of freely available open-source tools, tools developed by NDTH, or commercial software freely available for non-commercial projects, e.g., Cladonia Software Exchanger XML editor), which can be used in a three phase process:
- Producing generic auto-tagged texts to create TEI conforming XML output
- Customizing the TEI encoded text for individual projects
- Processing TEI tagged texts to create standardized output for Web and Print
Tools and Samples: The Newbook Process
- Generic auto-tagging tools to simplify the structural tagging procedure for creating Text Encoding Initiative [TEI] conforming XML input (e.g. by student interns). Tags supported by NDT: [hyperlink (one layer) with step by step instructions to autotagger depts.washington.edu/newbook/autotagger/]
- The resulting generic TEI texts can then be hand coded, using specific, additional TEI tags, to meet the requirements unique to individual projects. [hyperlinks to examples from EBA and GDTC/DTCG – WHAT KIND of tags have we used?]
- Using XSLT processors to produce TEI tagged texts for standardized output for Web and Print: XHTML/HTML5, PDF, and e-pub format
Below is a link to a sample document that can be used with the tools above to produce output. It may also be used as a model for marking-up your own data for use with our tools.
- Sample Valid TEI Project Document with Georgian Script
- hyperlink to the site so people can see the finished product
Further details are available on our GitHub.
The software tools listed below are readily available from sources on the Internet. Scripts and document samples developed by NDTH can be downloaded from this site. (hyperlinks to all of the following)
- UTF-8 Editors: Notepad++ (WinX), TextWrangler (OSX), vi, Emacs
- Cladonia Software Exchanger XML (Java-based, multi-platform): CHECKs XML for well-formedness, validity
- xmllint (Unix/Linux): DETECT errors in XML output
- JMS Auto-tagger (PERL-based): CONVERT plain text transcripts to TEI-XML
- XSLT scripts: CONVERT valid TEI-XML to XHTML/HTML5, LaTeX, tag-set lists
- TeX Live/MikTeX: CONVERT LaTeX sources to PDF
- W3C Markup Validation Service