EPPT allows users to encode image-based scholarly editions without having to know XML syntax. It automates or semi-automates repeating attributes, and provides templates to reduce errors and accelerate the encoding process.
A tool to convert Normal/Scanned PDF and Image to Word, Excel, PPT, Keynote, Pages, Text, etc. on Mac.
- Convert PDF to Word (.doc), Excel (.xlsx), and More Common Office Format Files
- Convert PDF to Pages and Keynote
- Convert PDF to Graphics Files
- Convert Scanned PDF with Accurate OCR
- Convert Multilingual PDF Files
- Support Password-Restricted PDF Files
Part-of-Speech (POS) tagging software for English - the classification of words into one or more categories based upon its definition, relationship with other words, or other context, also known as wordclass tagging. CLAWS (Constituent Likelihood Automatic Word-tagging System) uses several methods to identify parts of speech., most notably a system called Hidden Markov models (HMMs) which involve counting examples of co-occurrence of words and wordclasses in training data and making a table of the probabilities of certain sequences of words.
Smallpdf is a free online tool to compress, merge, split and convert PDF documents. It is simple to use and free to use. It can be useful to compress research papers, merge several documents together or extract graphs and images from PDF files.
CloudConvert supports the conversion between more than 200 different audio, video, document, ebook, archive, image, spreadsheet and presentation formats.
The CloudConvert API offers the full functionality of CloudConvert and makes it possible to use the conversion services in your own applications.
Overview is a tool for analyzing large sets of documents. In includes a sophisticated search engine, word clouds, entity detection, and topic-based document clustering. If that’s not good enough, you can write your own plugins using the API. It is open source and you can run it on your own computer.
It was originally designed for investigative journalists, but it’s now also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.
Overview is built to do several types of tasks:
Audacity is a free, easy-to-use and multilingual audio editor and recorder. Basic features, as listed on their website, include:
- Record live audio.
- Record computer playback on any Windows Vista or later machine.
- Convert tapes and records into digital recordings or CDs.
- Edit WAV, AIFF, FLAC, MP2, MP3 or Ogg Vorbis sound files.
- Cut, copy, splice or mix sounds together.
- Change the speed or pitch of a recording.
Combined with the Leptonica Image Processing Library Tesseract can read a wide variety of image formats and convert them to text in over 40 languages.
This code is a raw OCR engine. It has no output formatting and no UI. It can detect fixed pitch vs proportional text. Nevertheless in 1995 this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code is included in the open source release.
The core developer on the project is Ray Smith (theraysmith).
Google Docs is an online environment for editing and sharing documents, spreadsheets, presentations, forms, drawings, and tables. Google Docs documents can be public or private, or shared with anyone with a Google account, e-mailed, or downloaded in various formats, including conversions to PDF and other formats not identical to the original or to the proprietary format used at creation. Designated people with whom items are shared can be given permission to comment or edit the files, thus providing a quick way to collaborate on creating and editing documents and presentations.
TwapperKeeper is now called Hootsuite Archives and can be accessed from within Hootsuite.
CulturalAnalytics is an R package containing functions for statistical analysis and plotting of image properties, including statistics such as the standard deviation and mean in the RGB and HSV color spaces, image entropy and histograms in greyscale (intensity) and color, and for plotting color clouds and image scatter charts.
Map Warper is a tool for digitally aligning ("rectifying") historical maps to match today's precise maps. It is used publicly by the NYPL to crowdsource georectification of their own library of digitised historical maps.
In the wider version developed by Tim Waters user supplied maps can be georectified for subsequent use in your own mapping projects.
VoxcribeCC has the most accurate speaker-independent and topic-independent desktop speech recognition technology. It is used for media (audio\video) transcription and video-captioning.
Please watch VoxcribeCC Usage Video to learn using VoxcribeCC just in 2 minutes.
OxGarage is a web, and RESTful, service to manage the transformation of documents between a variety of formats. The majority of transformations use the Text Encoding Initiative format as a pivot format.
OxGarage is based on the Enrich Garage Engine developed by Poznan Supercomputing and Networking Center and Oxford University Computing Services for the ENRICH project.
See the conversion matrix for details.
Importing, transforming, storing and indexing data should be easy.
Catmandu provides a suite of Perl modules to ease the import, storage, retrieval, export and transformation of metadata records. Combine Catmandu modules with web application frameworks such as PSGI/Plack, document stores such as MongoDB and full text indexes such as Solr to create a rapid development environment for digital library services such as institutional repositories and search engines.
Praat is software for the phonetic analysis of speech, including support for articulatory and speech synthesis.
VARD 2 is an interactive piece of software produced in Java designed to assist users of historical corpora in dealing with spelling variation, particularly in Early Modern English texts. The tool is intended to be a pre-processor to other corpus linguistic methods such as keyword analysis, collocations and annotation (e.g. POS and semantic tagging), the aim being to improve the accuracy of these tools
AGTK is a suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs. Annotation Graphs are a formal framework for representing linguistic annotations of time series data.
Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.
CHET-C, or Chapel Hill Electronic Text-Converter, is a browser based software tool designed to convert digital texts that employ standard epigraphic conventions such as the Leiden sigla into EpiDoc-compliant XML files.
The tool can be accessed online at http://www.stoa.org/projects/epidoc/stable/chetc-js/chetc.html. Fragments of epigraphic text using standard sigla (eg Leiden convention markup) are pasted into the tool and Epidoc compliant XML is generated.
HandBrake is an open-source, GPL-licensed, multiplatform, multithreaded video transcoder
Insync extends Google Drive's web functionality to your desktop by integrating with Windows, Mac and Linux platforms. Insync allows for built-in sharing without a browser, multiple account support, on-demand shared file syncing, desktop notifications and more.
Best Media Converter for Mac to convert video or rip DVD to 160+ formats, burn to DVD, download online video, and share to YouTube easily.
- Convert video/audio files and rip DVD to 160+ formats including 4K Ultra HD
- Burn video to DVD disc, DVD folder and ISO files
- Batch download streaming videos from 50+ popular Websites
- Share to YouTube with a pop-up login window instantly