2. Downloading
Discover all the formats available to download your documents from Transkribus and store, publish or further analyse your transcriptions
If you want to work with your images and transcriptions outside of Transkribus, you can download your documents from the platform. Transkribus offers a variety of export options to fit your project's needs, allowing you to choose from different file formats and structures.
Below, we outline the steps and options available for exporting your work, depending on your subscription plan.

Your export window will open, where you can choose an export option from the list, then click Start export.

Standard export options (available with all subscription plans)
- Images: Export pages as image files (JPEG).
- Docx files: Export documents as Microsoft Word files.
- Transkribus PDF: Export documents as PDFs with embedded text.
- Text Files (TXT): Export the transcribed text as .txt files.
- Page XML: Export Page XML files for further technical use or analysis. It is an XML-based framework that captures image characteristics, layout structure, and page content in an XML file for each page. The complete format definition for Transkribus can be found here.
Advanced export option (available with Scholar, Team, and Organisation plans)
- Export Structural Elements to Mets: A METS (Metadata Encoding and Transmission Standard) file is a digital container holding all the essential background information about a specific file. For more in-depth insights, please visit the METS page.
- DOCX Document: Export to Microsoft Word format for editing purposes.
-
Preserve line breaks:
This option maintains the original line breaks found in the transcription, making the DOCX file closely resemble the layout of the transcribed document. - Force page breaks:
Selecting this will insert a page break in the DOCX file at the end of each page in the Transkribus document. This ensures that the start of a new page in the transcription corresponds to a new page in the Word document. - Mark unclear words:
If your transcription includes words tagged as "unclear" (often due to poor source image quality or illegible handwriting), this option allows you to highlight or mark these unclear words in the exported DOCX document. - Write image name before text:
This option adds the image name before the transcription of each page. You can choose the filename pattern (pageNr+file name; file name; docId + pageNr + pageId).
-
- PDF Document: When exporting a document as a PDF, you can select how your images and text will be included in the final PDF file. Here are the options you can choose from:
-
Image plus text layer:
This option embeds both the page images and the transcribed text layer into the PDF. The text layer is searchable, and the term you are looking for will be highlighted (please note that the highlighted area may not perfectly match the word in the image because the word coordinates are determined from the lines with a certain degree of fuzziness).
-
-
-
Images only:
Select this option if you prefer having a PDF containing only the images of your document pages, without any overlaying text layer.
-
-
-
Extra text pages:
Opting for extra text pages will include an additional page in your PDF for each original page, containing only the transcribed text.
-
-
-
Highlight tags:
If your document includes tagged text (for example, names, places, or specific annotations), you can choose to highlight these tags in the PDF.
-
-
-
Highlight articles:
For documents that contain structured articles or sections, this option allows you to visually highlight the boundaries and titles of each article within the PDF. - PDF Type:
You can choose between Standard PDF and PDF/A.
-
- Text files (txt): Export the transcribed text as .txt files. You can choose to generate one combined file for the entire document or selected pages, or create a separate file for each page.
- Spreadsheet (Excel/CSV): Exporting metadata or transcribed text to a spreadsheet for data analysis offers four options: Table Export, Tag Export, Structural region export, and Page Metadata.
-
- Table export:
This option is designed for exporting transcribed data that has been structured as tables within your documents. It allows for further customisation of how your tabular data is presented in the exported spreadsheet:- Merge into one table:
This option combines data from all selected tables across your documents into a single table within the spreadsheet. - Export single column with image snippets:
By choosing this, the exported spreadsheet will include a single column containing image snippets of the table cells, in addition to the transcribed text. This visual representation can be particularly helpful for quick data verification or when the information's visual context is important.
- Merge into one table:
- Table export:
-
- Tag export:
If you've used textual tags to mark specific elements (e.g., names, dates, places) within your transcription, this export type allows you to compile all tagged elements into a spreadsheet, facilitating easy access and analysis of specific data points. - Structural region export:
If you have applied structural tags to your document, you can export your transcriptions in a structured, tabular format. This is particularly useful when working with Field models, to transform transcriptions directly into a clean dataset.
In this export format, the document is mapped onto a grid, with each row representing a single page and each column a unique structural tag. The resulting cells contain the specific transcription associated with that tag on that page. Because each tag represents a unique column, ensure that a specific structural tag is used only once per page to maintain data accuracy. - Page metadata:
Choose this option to export metadata associated with each page of your documents, such as page numbers, titles, or any custom metadata you've added.
- Tag export:
- ALTO XML: Export in ALTO XML format, often used for digital library collections.
You can split the line into words as a special export detail here.
ALTO is a specialised format that allows you to export your document for use in other programs. When selecting this format, you will receive an XML file for each page, containing both the content and layout information. It is commonly used alongside METS to describe the entire digitized object and establish connections between ALTO files, such as sequencing information. For more details, please visit ALTO. - TEI XML: Export in TEI XML format, suitable for academic research and digital humanities projects. You have a choice between two stylesheet options, each catering to different project requirements:
-
Standard:
This option refers to Transkribus's default TEI XML export format. It is designed to provide broad compatibility with TEI guidelines, ensuring that your exported document adheres to widely accepted standards for encoding textual data.
-
-
- Page2tei:
Choosing the page2tei option allows you to export your document using a specific XSL transformation, created by Dario Kampkaspar. This transformation is available on GitHub: page2tei.
This option is particularly useful for those requiring a more customised or detailed approach to TEI XML encoding. With Page2TEI, you will download a TEI XML file that is specifically structured according to the guidelines and enhancements provided by the Page2TEI XSL transformation. This can include more detailed representations of page layouts, specialized tagging, and other features that are not covered by the standard TEI XML export.
- Page2tei:
Expand Tag Settings
Some export options include an additional "Expand tag settings" section, providing further customisation for textual tag exports.
- No tags: Choose this option if you do not wish to include any textual tag information in your exported document.
- Export all tags in document: This will include all textual tags present in your document.
- Export only selected tags: If you only want specific textual tags to be included and highlighted in your document, select this option. You'll need to specify which tags are to be exported in a separate tag selection window.
Accessing exported files
Once you initiate the export task, your download job will be processed on the Transkribus server. You'll receive an email with a link to download your files, which will be valid for two weeks. To track the download progress, check the status in the Processes & Activity Section under "Up- & Downloads".
Additionally, in the Up- & Downloads overview table, you can conveniently access the download file by simply clicking on the three dots located on the far right and selecting the Download option.