Why just PDFs; why not clip HTML/HTML5

As a long Zotero user, I am continually excited about Paperpile and how it’s destined to evolve with new features.

One thing which I would find indispensable is the ability to easily clip the entirety of the web page in addition to (and often in lieu of) downloading the PDF. More often than not, it’s the text that I need, not the pictures. The ability to take that text and format it in different ways on different devices is liberating. It’s ironic that as we try to get more digital, we keep locked into old ways of absorbing information. The ability to search through and annotate this data would also be fairly straightforward.

I would like the ability to clip the HTML of an article from the website and automatically add it to the citation. The pages that I currently have in my library were imported from my old Zotero repository. Currently I have to manually save a page and then manually associate it with a citation. The ability to simply clip the page would be great. The saved page could link to the website’s CSS, be completely style-free, or re-formatted in an Instapaper/HTML5 type approach.

PubMed Central is a large proponent of this approach as well as many of the the larger publishers. Incidentally, the CSS and JS code used to create a PubReader presentation is available at their GitHub repository NCBITools/PubReader. Anyone can use or adapt it to display journal articles or other content that is structured as an HTML5 document.

(I thought this particular link appropriate as an example - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3830982/?report=reader)

If this isn’t on your horizon, I’d appreciate to hear other peoples opinions/workflows towards this goal.



Actually this is on our horizon. There are many reasons why we like that and we had many discussions about it internally already. We looked at PubReader and eLife lens and the underlying data models.

The problem is it won’t replace PDFs completely any time soon and I don’t thing that should be the goal anyway. So it would be something that needs to be implemented in addition to the PDF workflow. We don’t have the resources for that at the moment, given that people are still waiting for EZproxy support, Word plugin, mobile apps etc.

So it’s not anywhere near the top of our roadmap but clearly something that might become an important part of our product in the long term.


In my opinion, transitioning away from PDFs should absolutely be a goal and a top priority. Aside from managing references, a primary feature of applications like Paperpile and Mendeley is to actually read and annotate scientific papers in digital form. PDF is a great format for printing documents to paper, and not well-suited at all to reading documents on devices like widescreen computer monitors, or tablets. Of course, legacy support for PDFs will remain important until there is a reliable and efficient way for converting older scanned PDFs to ePub along with any highlights and annotations, but why delay the inevitable? PDF will be replaced by ePub for digital reading, it’s just a matter of when. Whatever application is on the cutting edge of this transition will gain the mass exodus of current users of other applications like Mendeley who find this feature important enough to warrant a divorce from their current reference manager. Mendeley already beat you to the mobile apps and Word plugin. Try beating Mendeley to something if expanding your user base is desirable.

1 Like