Stars
Tero Subtitler is an open source, cross-platform, and free subtitle editing software.
ocr-docker is small, Flask powerd web app, helps us to extract text from images and pdf document using OCR
Vrecord is open-source software for capturing a video signal and turning it into a digital file.
Rails application supporting the creation of OCR and the IIIF Content Search API
A client library for working with the ArchivesSpace API
This Guidance demonstrates how to validate checksums for compliance and audit requirements with an on-demand fixity check process.
A React audio player & transcription viewer.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Identify, review, and remove sensitive files
A tool for creating and managing Mailbags, a package for preserving email using multiple preservation formats
Serverless replay of web archives directly in the browser
Download an entire website from the Wayback Machine.
A feature-rich command-line audio/video downloader
A Rails engine supporting discovery of archival material
brozzler - distributed browser-based web crawler
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
This repository shares NARA-created open source software to support federal agencies in their preparation of metadata and permanent electronic records for transfer to NARA.
Uploads and downloads file inventories to and from ArchivesSpace
This is the general workflow to make archival information packages (AIPs) that are ready for ingest into the UGA Libraries' digital preservation system (ARCHive). The workflow organizes files, extr…
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.