The Open Syllabus Project (OSP) is building the first large-scale online database of university course syllabi as a platform for new research, teaching, and administrative tools.
We hope the OSP will improve our understanding of teaching, publishing, and intellectual history on a wide range of fronts, such as:
What are the most taught texts? How have fields changed? How do schools or departments within a field differ from one another? What is the demand for Open Access materials?
Because policies and norms around syllabus ownership vary, the OSP won’t publish syllabi without permission. The public side of the OSP will be a collection of tools for analyzing metadata extracted from the documents. We will also, in the course of this work, advocate for stronger open-access policies for syllabi.
We’re still building the community of interest in the project and would welcome help in four broad areas: access to collections of syllabi, tools development, relationships with libraries and archives, and exploration of the research potential of the database.
Wondering what’s new on the OSP GitHub? For those who were following the project on dhcolumbia/opensyllabus, we are now an independent GitHub group. Be advised that this is where we’ll be updating our code from now on.
This post was written by the Modern Language Association’s Jon Reeve, who participated in the OSP’s Hackathon last weekend. It was originally published on his blog here.
I was invited to hack around on the Open Syllabus project this past Saturday, which I was really excited to do. They’ve scraped the web and come up with around 1.5 million syllabi, and only just released their API to researchers this weekend. I wanted to run some computational analyses on these syllabi, to attempt questions like:
- What were the most frequently assigned texts in freshman composition courses?
- What disciplines exhibit the most variance between their syllabi? That’s to say, which subjects have the most similar syllabi, and which have the most divergent?
- What disciplines have the longest syllabi?
- In which disciplines do technological marker words like “blog” or “Twitter” appear most frequently?
To start with, I used a JSON file containing a subset of these syllabi–around 1000, and tagged them by subject, using their first lines and filenames as hints. (See the quick and dirty code here.) I then imported another 600 subject-tagged syllabi from Graham Sack‘s corpus, resulting in a subject-tagged corpus of around 1,300 syllabi. From there, I sorted the results by subject and ran it through NLTK to find collocations–words that frequently occur next to each other.
There were some interesting findings. Some were predictable, like “mineral resources” for Geology, “corale room” for Music, or “Homeric Hymn” for Mythology. Others were revealing about required
The inaugural Open Syllabus Project Workshop & Hackathon, held June 6 – 7 at Columbia University, was dedicated to thinking through the social, legal, institutional, and technical challenges associated with the OSP, and to outlining and implementing plans for the coming year. With participants from institutions and organizations from all over the country, spanning disciplines and fields of expertise from copyright law to education technology to digital humanities to machine learning, the event was a gathering of all kinds of individuals who find interest and great potential utility in the development of the OSP. During panel discussions throughout the day, participants considered the implications of the project, including new publication metrics based on the most-widely taught texts, a closer analysis into the procedural language of syllabi, and the basis for a larger examination into disciplinary history. Legal and policy issues, such as the question of who yields authority to make decisions about legal rights and access to syllabi, were examined in conjunction with future institutional and organizational conversations the OSP seeks to kindle. Finally, the OSP consulted technical experts on challenges such as Internet crawling and scraping, database architecture, and the extraction of structured data from unstructured texts.
In addition to discussing the wide application of the 1.5-million syllabi repository during the first half of the event, the Hackathon on the second day marked the first opportunity for the community to begin playing with the brand new OSP API. Participants split into three large groups to tackle the broader questions