Welcome, Overview, to the OSP!

The Open Syllabus Project is happy to to have Overview, an open-source document analysis and visualization system originally developed at the Associated Press for investigative journalists, as our newest project partner. The Overview team recently welcomed its own new addition, David McClure, who will be helping the OSP put 2 million scraped syllabi online, do natural language processing to extract citations from each syllabus, and build visualizations to do citation analysis on our massive corpus.

To extend our welcome to David and the rest of our new partners, we wanted to briefly introduce the Overview project and how its goals wonderfully correspond with the OSP’s. As an open-source tool originally designed to help journalists find stories in large numbers of documents, Overview automatically sorts documents according to topic and provides a fast visualization and reading interface. Since right now the OSP has accumulated more than 2 million syllabi, we’re glad to have Overview’s expertise on search and UI to help make all our data user-friendly and accessible.

Syllabi are increasingly messy documents, lacking standardization in both structure and, of course, content. That’s why Overview will be especially helpful for our task: Overview is designed specifically for text documents where “the interesting content is all in narrative form.” It’s been used to analyze emails, declassified document dumps, material from Wikileaks releases, social media posts, online comments, and now syllabi.

As its goal is to make advanced document mining capability available to anyone who needs it, Overview will critically support OSP’s citation analysis and back-end infrastructure. Ultimately, Overview will help us rigorously examine the millions of syllabi we currently have, their implications, and their impact. We’re looking forward to making the results of this analysis public for further discussion.

Overview is a project of The Associated Press, supported by the John S. and James L. Knight Foundation as part of its Knight News Challenge. Learn more about Overview and what the project has accomplished here.

December 10th, 2014 by

GUEST POST: The Open Curriculum Project

Here at the OSP, we’re always interested in seeing how others are tackling the complex problem of open curriculum sharing. So we were excited to hear about this “small open syllabus” project spearheaded by Sherif Mansour after attending Wikimania 2014. Here’s his guest blog post introducing his project to open Egypt’s educational system:

The Open Curriculum Project

I attended Wikimania 2014 in London, Wikipedia’s annual conference. I was inspired by some of the genuinely amazing people doing some ground breaking work, and decided to see how that could help Egypt’s educational system.

What did I do?

1) Uploaded a scanned copy of the school text books in wiki-commons, which is a part of Wikiepdia for media files and documents.

2) Created a wiki-source “digital library” page which allows the user to edit each page and see the scanned copy side by side., allowing the quality of the material improve over time.

3) Created a page that details the course and foster a discussion around it.

4) The page can be downloaded as a PDF/EPUB file format meaning it could be read on almost all devices and e-readers while maintaining an up-to-date version of textbook.

This video might help explain.

You may not know this but Egypt’s Ministry of Education (MOE), has all the school books up on a public Microsoft “Drop Box” found here.

I then uploaded the first year secondary school text books to wiki-commons and tagged them as such (notice the “الصف_الاول_الاعدادي” tag). I tagged the books by Country, Year, Term, Subject and Language (sometimes).

But that’s not all I did — in fact that on its own is not really worth mentioning. Wikipedia has a website, a digital library of sorts called Wiki-source, and I created pages for some of these books and linked the scanned books to the digital library pages.

The results were quite promising.

What wiki-source did was create a page for each physical one in the scanned book, and populated it with text that it gathered by image recognition (OCR).

Putting it all together you end-up with a living document like this:


What does that mean for the Educational System?

Quite a lot actually, the platform allows teachers, parents, academics, and almost anyone to search, and contribute content to improve its quality like a living document. It allows people to have a conversation around each page (notice the discussion tab on every page). For students of course it means not only do they have Wikipedia’s body of knowledge put pages specific to their courses — pages they can customize and even if they lose their physical books its always there online.

What can you do from there?

Plenty! Right now the Wiki pages are almost a blank canvas (I only started a few hours ago), with outlines/table of content. But because the books themselves are uploaded, there is more than enough material to build up from.

If you are a teacher/parent/academic or just simply interested in helping out, then you could contribute in the following ways:

1) Spread the word! forward this email, share some of this on social media, or NGOs that you think would benefit.

2) Go and download the E-books from the Dropbox. Upload them to Wikicommons and do not forget to tag them with categories. I used the following examples:

(Year) الصف الاول الاعدادي
(Term) الفصل الدراسي الثاني
(offical book) وزارة التربية و التعليم
(country) جمهرية مصر العربية

3. Create a Wikisource index using the same name as the wiki-commons file example:

The file “Science.Tr1.pdf” in Wikicommons becomes https://en.wikisource.org/wiki/Index:Science.Tr1.pdf.

For each wiki-source page you can then edit the content and create a page that completes the picture for example make the title link to an article page (in the since example the title was set to [[Science 1st year Secondary School 1st Term]] which then allows you to create https://en.wikisource.org/wiki/Science_1st_year_Secondary_School_1st_Term.

Are there legal concerns?

Good question. It doesn’t look like it. According to Article 141 of Intellectual Property Law 82 of 2002, these works are not an object of copyright in Egypt because they are official documents. Regardless of their source or target language, all official documents are ineligible for protection in Egypt (use wikipedia’s {{PD-Egypt-official}} license category). (https://commons.wikimedia.org/wiki/File:Egyptian_Intellectual_Property_Law_82_of_2002_(English).pdf). While final decision rests upon the Egyptian government of course the fact that they have made them public to anyone over the internet via a Microsoft Open Drive speaks for their intent on how these books are to be used.

I also had a very long discussion about it with members of Wikipedia’s editorial and legal teams but they were ultimately fine with it.

November 3rd, 2014 by

ProfHacker: On Sharing Syllabi

Konrad Lawson recently wrote a post on The Chronicle of Education‘s ProfHacker Blog commiserating with the headaches of syllabi crafting: “Having read so much and so deeply over the years on similar topics, how do profs putting together a new class remember where the good stuff is and, more importantly, just enough of the good stuff that it fit the recipe for the course? Surely designing an excellent course from scratch like this requires a huge amount of work, and years of tweaking.”

Indeed, it does, but there are undoubtedly ways to make it easier. As those of us in the OSP community know, the first step is to change the way we think about sharing our syllabi. Here are some proposals Lawson makes on how to do that:

  • “When you write your syllabus, include the year and semester it was taught. This can serve as a version number, and will allow different versions to be acknowledged accordingly.
  • Consider uploading your syllabus somewhere relatively stable online, to reduce the chances of link rot or keep versions alive online where search engines can find them. Upload, for example, to archive.org or to github.com, or somewhere your university it unlikely to take it down. Remember LOCKSS (Lots of Copies Keep Stuff Safe).
  • Consider adding a Creative Commons or other open license to your syllabus somewhere explicitly so that, beyond “fair use” and the un-copyrightable nature of ideas, others can feel comfortable adopting and modifying (with attribution) larger chunks of, say, your assignment descriptions or class policies.
  • Consider putting the above info in a convenient meta-data section at the bottom of your syllabus so that it can be easily found, or as an additional file alongside it if it is in a repository (for example, on github). Hopefully, if this catches on, it might further facilitate the kind of larger scale work by projects such as the Open Syllabus Project. Other useful metadata to include there might be some keywords related to your course, its level of difficulty, the expected size of the class, whether it is a lecture, seminar, etc. course, and other basic info that might found elsewhere throughout the syllabus such as the instructor, university, course title, and version.
  • As sharing of syllabi becomes more common, we will be more conscious of “other readers” beyond our students of the document. One question those readers will often ask themselves, as they decide whether they want to adopt some of your readings, policies, or assignment types, or other material is: did it work? What worked well, and what needed more refinement? The answer to these questions can sometimes be detected in the changes from one syllabus to the next when multiple years are available, but a “changelog” of some kind indicating what bugs were fixed and new features added can be useful, even if, like the “changelog” of computer code, it is just a quick list of bullet points.
  • When acknowledging other syllabi, be specific, and if a digital version of your syllabi exists, include a link (but also enough words from the title and year to enable a targeted search if the link no longer works).”

Read more on Lawson’s original post.

September 10th, 2014 by

OSP Workshop Panel Discussions Now Live

Missed the OSP Workshop & Hackathon in June? You can catch up on the discussion in full recordings of our Workshop Panels here. Here’s the full schedule of panels and their participants:



Session 1http://youtu.be/cZTjrEsOtFw

Introduction to the Open Syllabus Project Workshop
Description: The Open Syllabus Project (OSP) is building the first large-scale online database of university course syllabi as a platform for the development of new research, teaching, and administrative tools. The inaugural OSP Workshop is devoted to thinking through the social, legal, institutional, and technical challenges associated with this project, and to outlining and implementing plans for the coming year.
The Power of Lots of Syllabi

Description: The workshop brings together researchers and groups who are interested in what can be done with lots of syllabi. One virtue of the OSP — we hope — is that it can serve as a platform for a wide range of these projects.  We want to start the day by putting these goals and agendas on the table for discussion, in the hope of better understanding their implications for the project, the partnering institutions, and the wider scholarly community.

  • Ted Byfield, The New School, Institutional Transparency Agendas
  • Rachel Buurma, Swarthmore College
  • Tessa Joseph Nichols, University of North Carolina – Chapel Hill
  • Dennis Tenen, Columbia U, Computational Methods for Literary Analysis

Session 3http://youtu.be/5NiKLTgtJdQ

Legal and Social Challenges around Syllabi

Description: There are no settled norms regarding the ownership, status, or collection of syllabi. Some schools require posting to departmental websites. Some faculty do so independently. Some view syllabi as quasi-trade secrets or, at least, as a proxy for their pedagogical freedom. University policies are often unclear. The challenges of collecting and curating syllabi raise both legal and social challenges — especially given the open-access agenda of the OSP.


  • Kenny Crews, Gipson Hoffman & Pancione
  • Joe Karaganis, The American Assembly

Session 4http://youtu.be/mTl6horhX1w

Technical Challenges of the Open Syllabus Project
Description: This session explores some of the technical features and challenges facing the OSP during the next year of development (and in the Hackathon). Topics range from Internet crawling and scraping, to choices regarding database architecture and the API, to the opportunities for (and limits of) machine-learning-based extraction of structured data from the documents.
●  Dennis Tenen, Columbia University
●  Miao Chen, HathiTrust Research Center
●  Apoorv Agarwal, Columbia University
●  Jian Wu, Citeseer, Penn State
Challenges/Lessons of Similar Efforts to the Open Syllabus Project
The OSP raises issues of management, openness, interoperability, and sustainability that have been addressed in other projects, from learning management platforms to new archives and libraries. What can the OSP learn from those projects? What could the OSP look like in 3 years? In 10?

  • Amy Nurnberger, Center for Digital Research and Scholarship Robert McDonald, HathiTrust Research Center
  • Judd Ratner, Intellidemia
  • Sandeep Jayaprakash, Sakai
  • Robert McDonald, HathiTrust Research Center
  • Caitlin Trasande, Digital Science

Session 6http://youtu.be/Jy0XO7cndps

Concluding Remarks on the Open Syllabus Project & Hackathon Introduction
Description: We would like to leave with a high-level plan of action for the next year and — to the extent possible — a set of commitments from partners and participating teams to contribute to the institutional and technical development of the OSP. We also discuss aims and goals for the OSP Hackathon/Sprint taking place the following day.
  • Joe Karaganis, The American Assembly
  • Ted Byfield, The New School, Institutional Transparency Agendas


Also check out our overview of the Workshop & Hackathon, including pictures and community buzz here.

August 11th, 2014 by