Frequently Asked Questions of the Oxford Text Archive
Table of contents
- 1. Most Frequent Question
- 2. About the OTA
- 2.1. What is the OTA?
- 2.2. Who funds the OTA?
- 2.3. Where is the OTA?
- 2.4. How much do your services cost?
- 2.5. What projects are you involved in?
- 2.6. What plans do you have for future developments?
- 2.7. Who works at the OTA?
- 2.8. What was the AHDS?
- 2.9. When did the OTA start?
- 2.10. Is the OTA part of the University of Oxford?
- 2.11. Is the OTA part of the Oxford University Press?
- 2.12. Is the OTA part of CLARIN?
- 3. Searching and Downloading
- 3.1. Do you only have texts?
- 3.2. What's the best way to find what I'm looking for?
- 3.3. Why do some texts seem to appear more than once?
- 3.4. What formats are the resources in?
- 3.5. What is the TEI?
- 3.6. Are all your resources in TEI XML?
- 3.7. What is SGML?
- 3.8. What is XML?
- 3.9. Why does it say 'unknown format' in the description of some resources?
- 3.10. What is the difference between a freely available text and a restricted one?
- 3.11. Why do some resources require asking for permission?
- 3.12. Why do I have to give you my email before downloading?
- 3.13. What will you do with my personal information?
- 3.14. I requested a freely available text where is it?
- 3.15. I requested a restricted text where is it?
- 3.16. Do you distribute the BNC?
- 3.17. Is there a printed catalogue available?
- 3.18. What languages are the resources in?
- 4. Depositing
The resources in the OTA do not rely on file suffixes to indicate what application should be used with them. You shouldn't expect your computer operating system to be able to identify the type of files and suggest an appropriate program to process it. This would require your computer to know all about the various practices that have been followed in the past 30 years of text encoding in the Humanities, and it would need to know how you wish to process the file. Resources in the OTA are made available for the purposes of scholarly research, so we assume some familiarity with electronic text, but we don't assume that we know what you want to do with the files.
Having said that, the files that you have downloaded are probably plain text files, and you could try using a text editor or a web browser to see what it is in there. Unfortunately, the OTA does not have the resources to deal with queries about problems relating to opening files.
The University of Oxford Text Archive (OTA) is a repository of digital literary and linguistic resources for research and teaching. We also offer advice to resource creators about best practice for creating digital resources, and to users of digital resources on how to benefit from existing resources.
The OTA used to host AHDS Literature, Languages and Linguistics.
The OTA is supported by Bodleian Libraries and IT Services, University of Oxford, and additional funding is sometimes acquired through project work. The archiving and repository services of the OTA are provided pro bono to the academic community.
The staff responsible for the OTA are physically located at Osney One Building in Oxford. Please see our Contact Information page for our addresses.
Access to our catalogue is free. You can search the collection and download resources for free. We used to offer free advisory and archival services to UK higher and further institutions as part of our AHDS remit until the end of March 2008. We are willing to provide these and other services, undertake project work, and provide consultation and have a pricing policy with regards to such activities. If you have any questions about our services, please do not hesitate to get in touch with us at email@example.com.
The OTA is involved in numerous projects, initiatives, services, special interest groups and associations. A list can be found on the projects page.
The OTA will continue to develop its current services but also wants to respond to changes in the needs of our user community. If you have any suggestions of developments that you would like to see, please let us know at firstname.lastname@example.org .
We maintain a list of current staff on our Contact Information page.
The Arts and Humanities Data Service (AHDS) was a national service (in the UK) aiding the discovery, creation and preservation of digital resources in and for research, teaching and learning in the arts and humanities. The AHDS covered five subject areas, and was organised via an Executive at King's College London and five AHDS Centres, hosted by various Higher Education Institutions. The AHDS was funded by the Joint Information Systems Committee and the Arts and Humanities Research Council. Visit the AHDS website for more information.
Yes. The OTA collections can be found via the Virtual Language Observatory, and the OTA is involved in a number of initiatives to share resources via CLARIN. The OTA is a registered CLARIN C Centre, and a migration to the CLARIN DSpace platform is under way, with a launch planned in 2017.
The OTA collection consists of resources deposited with us over a long period of time. Some resources may exist in more than one variant. These can be either be different editions of the same text, different electronic versions of the same edition, or the same resource in different formats, for example in plain text as well as sgml. To see what the difference is between two resources, follow the link to 'more info' for each resource.
The Text Encoding Initiative (TEI) issue guidelines for the mark-up of text. To learn more about the TEI, visit their web page at http://www.tei-c.org/. We have a great deal of expertise in TEI encoding and are able to provide detailed advice on preparing TEI XML resources.
No, the format of the resources varies. Each resource does, however, have a TEI header, containing information about the resource. The static pages that make up the website, such as this FAQ, are all also stored as TEI XML.
Standard Generalized Markup Language is an international standard used for annotating documents with information about structure and semantics in a way that both computers and humans can understand. HTML and XML are based on the earlier SGML standard.
XML stands for eXtensible Markup Language. It is a standardised way of tagging texts, in order to represent information about the structure of documents, and can also be used to add annotations and interpretative information. It is a simplified subset of the Standard Generalized Markup Language (SGML).
The OTA started archiving texts before there were generally accepted standards for text formats. Some of our older resources were deposited in a format that is unknown or poorly documented, perhaps with annotation that does not follow any standards. Such texts are given the label 'unknown format'.
The resources in the OTA collection have been deposited with the Archive under different licenses. Some depositors require that you register and sometimes also contact them before you are allowed to download their resource. These resources you have to request first by filling out a form. Other resources are able to be freely downloaded, but this still involves providing your email so we can send you a link at which you can download the text.
Some of our depositors want to be consulted before anyone can access their resources. It may be that they want to know who is using the resource or that they are working on improving or expanding the resource and may have a later version available. The OTA encourages all depositors to make their works freely available if at all possible.
The resources in the OTA collection has been deposited by different individuals. Some are happy to make the resources freely available while others impose certain restrictions. One such restriction is that interested users have to register before they can download the resource. In order to simplify the maintenance of the OTA website and delivery of its resources we use the same process for both restricted and freely available resources. In the case of freely available ones, we only ask for your email address and send you a link to download the text fairly quickly. Those requesting restricted resources may have to wait longer.
The OTA is part of the University of Oxford which is registered under the Data Protection Act 1998. Personal information submitted via forms within the ota.ox.ac.uk domain will be stored securely. This information may be used for a number of activities such as: statiscal analysis to benefit the OTA user community, to assist with any queries you have regarding a resource you have downloaded, and where required to keep depositors informed of the users/uses of their material. We will not otherwise distribute, sell, trade or rent your personal information to third parties. Please also see our data protection statement
A link to the resource you requested should have been emailed to you at the email address you provided when requesting the text. If you filled in the email address incorrectly, you will not receive your notification. If an hour or two has passed an you have not received your notification, and it hasn't ended up in your junkmail or spam folder, then try to request the text again. If you still have no luck, email us at: email@example.com and we will try to find out what has gone wrong.
When you request a restricted text, a notification goes into a queue awaiting the receipt of your signed request form. We will then fulfill the requirements of the depositor (e.g. recording your information, contacting them for permission) and when we are able to we will email you a link from which you can download the resource. This may take days or weeks depending upon the conditions (and willingness to respond) of the depositor. If you are concerned that your request is taking a long time email us at: firstname.lastname@example.org and we will try to find out its status.
Yes - take a look in the corpora in the catalogue. In past years, the British National Corpus (BNC) was curated by Oxford University Computing Services separately from the Oxford Text Archive collections. It still has its own website: http://www.natcorp.ox.ac.uk/.
No, we can accept deposit in other formats as well, as long as they are of sufficient quality and come with good documentation. Some formats are less suitable for preservation and we may not be able to guarantee that these remain usable in the future. We may refuse deposits that are in an unsuitable format or for a variety of other reasons.
Your deposit (or another one you believe us to have) may not appear in our catalogue for a number of reasons. These could include that it is still being accessioned, or (if previously available) that it has temporarily been taken off line for a number of reasons. (This can range from migration or website maintenance to someone having expressed a copyright concern.) These are sometimes temporary, and if it turns out to be permanent we will make a best-effort attempt to contact the original depositor. If the deposit is still not there when you check back after a reasonable length of time, please get in touch with us at by email at email@example.com and we'll be happy to investigate the matter.