Playing with machines: Using machine learning to understand automated copyright enforcement at scale

Big Data and Society 7 (1) (2020)
  Copy   BIBTEX

Abstract

This article presents the results of methodological experimentation that utilises machine learning to investigate automated copyright enforcement on YouTube. Using a dataset of 76.7 million YouTube videos, we explore how digital and computational methods can be leveraged to better understand content moderation and copyright enforcement at a large scale.We used the BERT language model to train a machine learning classifier to identify videos in categories that reflect ongoing controversies in copyright takedowns. We use this to explore, in a granular way, how copyright is enforced on YouTube, using both statistical methods and qualitative analysis of our categorised dataset. We provide a large-scale systematic analysis of removals rates from Content ID’s automated detection system and the largely automated, text search based, Digital Millennium Copyright Act notice and takedown system. These are complex systems that are often difficult to analyse, and YouTube only makes available data at high levels of abstraction. Our analysis provides a comparison of different types of automation in content moderation, and we show how these different systems play out across different categories of content. We hope that this work provides a methodological base for continued experimentation with the use of digital and computational methods to enable large-scale analysis of the operation of automated systems.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,386

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

What Is a Copyright Work?Brad Sherman - 2011 - Theoretical Inquiries in Law 12 (1):99-121.
Copyright and educational policies: A stakeholder analysis.Suthersanen Uma - 2003 - Oxford Journal of Legal Studies 23 (4):585-609.
Copyright Licensing.Richard Hooper - 2013 - Logos 24 (2):33-40.
Granularity Analysis for Mathematical Proofs.Marvin R. G. Schiller - 2013 - Topics in Cognitive Science 5 (2):251-269.
Mass Surveillance: A Private Affair?Kevin Macnish - 2020 - Moral Philosophy and Politics 7 (1):9-27.

Analytics

Added to PP
2020-11-24

Downloads
20 (#747,345)

6 months
8 (#342,364)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations