Internet Archive Scholar, warehouses great materials that can still be useful to teachers and students in 2024 (Web and Cloud)

Internet Archive Scholar - An Academic Version of the Internet Archive. The Internet Archive warehouses all kinds of fantastic materials (and some not-so-fantastic) that can be useful to teachers and students.

Jan 22, 2024 - 05:00
Jan 21, 2024 - 23:02
 0
Internet Archive Scholar, warehouses great materials that can still be useful to teachers and students in 2024 (Web and Cloud)
Internet Archive Scholar - An Academic Version of the Internet Archive, warehouses great materials that can still be useful to teachers and students in 2024
Techatty - be supportive
Support projects for a higher ROI

About Internet Archive Scholar

Internet Archive Scholar - An Academic Version of the Internet Archive. The Internet Archive warehouses all kinds of fantastic materials (and some not-so-fantastic) that can be useful to teachers and students.

What is Internet Archive Scholar and How It Works?

Content in Internet Archive Scholar search index comes in one of three forms:

This 2019 FORCE11 conference presentation gives an overview of the technical infrastructure and goals of the project overall.

Content Sources

Metadata comes from fatcat.wiki, an open user-editable catalog of scholarly work.

Text and Data Mining

We intend to provide researcher access to the full corpus for text and data mining purposes. Derived datasets may also be posted publicly for analysis, for example a citation graph or N-gram frequencies by year. If you are interested or would like to see specific datasets made available, please contact us at scholar@archive.org.

Currently snapshots of the full fatcat metadata corpus and upstream metadata sources are uploaded periodically to the Bulk Bibliographic Metadata collection on archive.org. Read more in the Fatcat Guide.

Scholar User Guide

This service provides fulltext searching for research publications archived in Internet Archive's various collections. It includes content from the natural sciences, humanities, biomedicine, art, history, industrial research, government reports, and more.

Reader access to the content is provided when possible. Sometimes this access is to a preprint or other version of the work, and this is indicated in the search results.  In other cases, depending on search filters, results are included for which there is only a bibliographic catalog entry. It may still be possible to obtain access through a public library or from the publisher directly.

Query Syntax

In addition to the basic filtering and sorting options, this search interface also allows the use of Lucene query syntax in the search box. You can restrict term queries on multiple metadata fields using colon statements like journal:Science, set filters like lang:de, and apply range queries like year:>1989 year:<2000.

While this syntax allows for relatively complex and powerful queries, at some point advanced users may run into limits on the size or complexity of queries.  For the time being we recommend systems like lens.org for a more powerful interface.

Techatty All-in-1 Publication
Techatty All-in-1 Publication

The Example Queries

As an experimental feature, if the search query "looks like" a formal citation, as found in the bibliography of a research paper, the service will attempt to parse the citation and do a match against our catalog of known works. When this happens, any filters are ignored. Try it here...

Metadata Fields

You can restrict to records where the field exists with an asterisk like doi:*, and negate any term like !type:article-journal.

In-depth documentation of the query syntax is available from the Elasticsearch project.

The complete current search document schema is available (as JSON) in the source code.

title:
author:
journal:
year:
issue:
volume:
doi:
tag: eg, "tag:oa"
type: eg, "article-journal", "dataset", "book"
stage: eg, "published", "submitted", "accepted", "draft"
lang; value is a 2-character lower-case ISO language code)
country: value is a 2-character lower-case ISO country code
access_type: "wayback", "ia_file", "ia_sim"

Search Results

Access Links

All Internet Archive preservation copy links have the same style and icon. Content from the Wayback Machine looks like this.
If the preserved copy of the work is from a pre-print, author manuscript, or other alternative version of the work, the access link has an indicator. You can get details and view all versions by clicking on the primary title link
Some preserved content, particularly older Public Domain works, may be stored in general Internet Archive digital collections (as opposed to the web archive)
Digitized copies of works on microfilm may be linked to experimentally. Access may be limited to controlled lending
A publisher landing page is the authoritative source for the "version of record" of a research publication, but content is not always accessible to the general public
When the work is from an Open Access publication (sometimes known as "Gold" or "Diamond" OA), and the publisher is expected to provide access to all readers, the button has an orange "unlocked" icon
If the work is archived in full on a reliable, open platform, we will sometimes provide additional links

Tags

Search results may have tag labels which provide additional context about the work. For example, indexes the journal is included in, or open platform technology used for publications.

Multiple Versions There are multiple released "versions" or "editions" of this work, and bibliographic metadata for the "primary" is being shown. Click the title to see other versions
lang:en The primary language of this work is different from the search interface language. The ISO two-letter language code is indicated
DOAJ Published in a Directory of Open Access Journals publication, which implies that this is an Open Access work
Szczepanski Publication indexed in Szczepanski's List of Open Access Journals, which implies that this is an Open Access work
Open Access The work is believed to be "Open Access" for any other reason
SciELO Published on a SciELO national platform
OJS Published using Open Journal Systems software
Wordpress Published using WordPress software
JSTOR Preserved and/or hosted on the JSTOR digital preservation platform


Authors and Publishers

Information specifically for authors of research works can be found at https://guide.fatcat.wiki/authors.html. This includes instructions for correcting bibliographic metadata and updates to published works.

In alignment with its mission, Internet Archive makes automated attempts to capture and preserve all open access research publications on the public web. If your open access journal articles aren’t currently included in Internet Archive Scholar, you can fill out the Internet Archive Scholar inclusion form to share journal URLs for crawling. Please note that it will take at least 3 months from the time you submit this form until your content starts to appear in Internet Archive Scholar. Emails about the status of your content sent less than 3 months after submitting this form will not be answered. Additional information specifically for publishers can be found at https://guide.fatcat.wiki/publishers.html. This includes guidelines for having content indexed and preserved.

Contact Information

Queries about this search service and the fatcat catalog can be directed to scholar@archive.org. There is a public chat channel at https://gitter.im/internetarchive/fatcat.

Support and Acknowledgements

Work on Internet Archive Scholar has received support from the Andrew W. Mellon Foundation through multiple phases of the "Ensuring the Persistent Access of Open Access Journal Literature" project (see original announcement).

Web and Cloud LLC Your reliable technology partner, and digital inclusion advocate from Miami, Florida. Web and Cloud is a technology partner with extensive expertise in consulting, software engineering, server infrastructure and services, support and management of technology projects for companies of all sizes. Let's talk about leveraging the power of innovative technology, and AI & Human collaboration to help your business or startup works smarter, move faster, and achieve greater success. *Login or create a free account at my.webandcloud.com to get started.
Web and Cloud LLC - talk to us and let's discuss your needs.
Let's help transform your business