Shared metadata suggestions among Paperpile users

Kerim · March 12, 2015, 7:24am

A lot of the metadata from the web is not very complete or accurate. It seems that it would be nice if another Paperpile user has already uploaded the PDF and corrected the metadata that Paperpile could offer to auto complete the metadata on my PDF based on the corrections of that other user - not just what is available in online databases. The more people use Paperpile, the more useful this feature would become. Of course, it would be optional, maybe letting you compare the changes before applying them. And anonymous - not sharing tags or notes - just the basic metadata. I have no idea if it would be doable or practical, but I just thought I’d throw it out there in case it is a useful feature to consider.

stefan · March 12, 2015, 9:12am

First, if something is not complete and accurate please report. We are constantly improving. We can’t answer individually all report of failed imports but we systematically review them and as you see in our changelog have improvements in almost every release: https://paperpile.com/changelog

In principle I like the idea of crowdsourcing the edits. Here, it’s a matter of critical mass of users though. A problem is that those sources that are not perfect are probably those that are only used by a minority of users. Common sources where it’s likely that users import the same paper are typically accurate I would guess.

But if you can help us increasing our user number by an order of magnitude we are happy to consider this approach

Kerim · March 12, 2015, 9:59am

This is just not practical. For one thing, I don’t really pay that much attention unless PP says “incomplete” or I am preparing a bibliography for publication. At that time I might notice that the location is missing, or the book title also includes the series title, or I only have the author’s initial when I need the full name, etc. It just doesn’t seem possible to fully and accurately automate all of this, which is why I often think about crowdsourcing it…

stefan · March 12, 2015, 10:30am

Ok, I agree. For subtle inconsistencies and missing or incorrect data it’s hard to report for the user and also hard to fix for us. In many cases we are limited by the data we get from the source and if it’s not perfect we can’t really fix it.

Kerim · March 12, 2015, 11:13am

I wonder if it isn’t possible to compare various sources? For instance, I often add books from Amazon or Google Books because it is faster and easier to do so, but I think WorldCat has better quality data … Not sure how this would work?

Bjorn_Johansson · July 25, 2016, 6:42pm

Hi, I just had an idea this idea as well

I probably have lots of references in my library that many other paperpile users have.
Most of the metadata should be the same, if it is not, usually it means that one is correct and the others are wrong or incomplete.

Paperpile could also save storage space at the same time (like dropbox does).

Has this been given any though?

SergeiWallace · January 12, 2024, 11:33pm

Some sort of crowdsourcing system to allow Paperpile users to both propose bibliographic metadata changes as well as review and verify other proposed metadata changes would be great. Obviously not a small lift and would require some careful consideration on design.

Spitballing some ideas on this:

Perhaps some sort of gamified point system to reward contributions, accuracy, and distinguish ‘expert’ metadata submitters/reviewers and punish spamming, incorrect data (and improper validation/review). The punishment component could allow for some leeway for one-off mistakes.
If Paperpiles bibliographic metadata database isn’t a proprietary asset, then it could even be open-sourced or made available to paperpile subscribers.
Perhaps a crowdsourcing metadata app like CLEF. Crowdsourcing Linked Entities via web Form could be leverage (or used as inspiration) as a part of this system combined with the aforementioned gamification features.

ajn · January 25, 2024, 5:51am

+1 for sergei’s comments. I think this kind of functionality would actually be a core product differentiator for Paperpile