Increasing the Productivity of Scholarship

The Case for Knowledge Graphs

Paul Groth - Elsevier Labs

@pgroth | pgroth.com | labs.elsevier.com







images Slides: http://pgroth.com/slides/savesd2015.html# (repo)

Outline

  • Productivity in scholarship
  • A model
  • Current solutions
  • Knowledge Graphs & Why
  • Challenges

Productivity

Global Scholarly Output

Global Scholarly Output

64% growth in scholarly output

Growth in researchers correlates to journal Growth in researchers: 4–5% per year STM Report

R&D Intensity

US Federal R&D Investment

Scholarly productivity is not increasing

Why?

The Burden of Knowledge

  • Benjamin F. Jones
  • The Burden of Knowledge and the ‘Death of the Renaissance Man’: Is Innovation Getting Harder?
  • NBER Working Paper 11360
  • http://www.nber.org/papers/w11360

The Model (Roughly)

if one is to stand on the shoulders of giants, one must first climb up their backs, and the greater the body of knowledge, the harder this climb becomes. - Jones

  • Knowledge accumulates
  • Need for more education
  • => Innovators narrow their expertise
  • => Larger teams
  • More overhead to produce new knowledge

Facts & Figures

  • U.S. team size is seen to be increasing of 17% per decade
  • Specialization is increasing by 6% per decade
  • Nobel Prize winners invention increased by 6 years over the 20th Century
  • R&D employment rising dramatically, yet TFP growth has been flat (Jones, 1995b).
  • Average number of patents produced per R&D worker has been falling over time across countries (Evenson 1984)

Reading more but with less time

  • "45-50 minutes in the mid-1990s to just over 30 minute" - The 2015 STM Report Growth in number of papers read

Citing more...

References per article

  • Long-Term Variations in the Aging of Scientific Literature: From Exponential Growth to Steady-State Science
  • Vincent Larivière, Éric Archambault, Yves Gingras JASIST

Age at first innovation

Inventors per patent

Number of co-authors

1000 author paper

All is not lost

101 Innovations in Scholarly Communication

101 Innovations in Scholarly Communication by Jeroen Bosman & Bianca Kramer

Great! but not enough

Knowledge Graphs

Knowledge Graph Definition:

graph structured knowledge bases (KBs) which store factual information in form of relationships between entities

  • Nickel et al. "A Review of Relational Machine Learning for Knowlege Graphs" arXiv:1503.00759v1d
  • Typically integrated with some form of context or probablities associated with facts
  • A nice tutorial

Knowledge Graph Snippet

Review of sizes of knowledge graphs

How does the concept help?

Integrate four core knowledge types

  1. Databases
  2. Text
  3. Models
  4. Social Networks

Coffee genome database

Lots of text

2.5 million articles a year

A couple of notes:

  • We too often look at articles as independent blocks
  • Many databases are curated from text (e.g. Chembl, Reaxys)

Example: Paleontological Databases

paleodb

  • Paleobiology Database (PBDB; http://paleobiodb.org)
  • Peters SE, Zhang C, Livny M, Ré C (2014) A Machine Reading System for Assembling Synthetic Paleontological Databases. PLoS ONE 9(12): e113523. doi:10.1371/journal.pone.0113523
    • "the majority of the data were extracted from approximately 40,000 publications"
    • "leverages only a small fraction of all published paleontological knowledge"
    • "because the end product of manual data entry is a list of facts that are divorced from most, if not all, original contexts, assessing the quality of the database and the reproducibility of results is difficult."

Deep Dive Architecture: Text + DB

Deep Dyve Arch

Models

  • Many models can be expressed with respect to graphical structures.
  • Examples
    • Link Prediction
    • Collective classification
    • Entity Resolution
    • Cellular Networks
    • Input to QA systems
  • Potential for common variables in models

FoxPSL

Domain Size

Sara Magliacane et al. FoxPSL: An Extended And Scalable PSL Implementation. AAAI Spring Symposium 2015 on Knowledge Representation and Reasoning.

Same knowledge different techniques

wikidata and graphx

The Burden of Knowledge?

  • Have computers attack the problem
  • Perform synthesis between the various kinds of knowledge we produce
  • Come up to speed by having information in one place
  • Ability to make smaller contributions that spread faster
    • e.g. wikidata

Challenges

  • Integration with modeling environments
  • User interaction
    • Are cards the only thing?
    • Is voice really the right way?
  • Tackling highly specific domain knowledge

Conclusion

  • The problem of too much is very real in scholarship
    • The Burden of Knowledge
  • New computational tools are necessary, but
    • Look at addressing systemic problems
    • See also:
      • discoveryinformaticsinitiative.org
      • DARPA Big Mechanism
  • One exciting tool is knowledge graphs

We are hiring

elsevier

  • recommender systems
  • machine learning
  • big data processing
  • graph analytics

contact @pgroth | @elsevierlabs