Adjust PDF page number to actual page numbers of the article in the journal


#1

I’ve been using the new PDF viewer the past few days and wanted to add another suggestion: at the moment, notes display what page they’re on within the PDF (p. 1, 2, 3 etc.) but these rarely match up with the actual page numbers in the document itself (which is usually from a journal several hundred pages long). This means that I have to refer back to the PDF to check what page a note is referencing if I export it, and in order to cite it properly.

It would be really useful if you could tell the PDF viewer somehow what page numbers in the document the pages of the PDF match up to - e.g. tell it that page 1 of the PDF is actually page 125 of the paper itself. Notes could then display the actual page number they relate to rather than just the page of the PDF document they’re on (which is a bit useless in and of itself).

This would make exported notes a lot more useful once they were exported as you could just use them directly for referencing without having to refer back to the original PDF to check what actual page number each relates to.


Highlight and annotate PDFs in Paperpile
#2

I can see how that is useful but it’s not trivial to implement. We would have to analyze the PDF for page numbers and that’s complicated for many reasons.


#3

Hi Stefan, that sounds fantastic - it’s not a massive pain at the moment, but saving to Drive would be just a useful extra feature to have, glad you’re considering it!

As regards the PDF page number problem, I totally understand that having the reader analyze a document to determine page numbers would be a very complicated thing to do. I was wondering, instead, if it would be possible to basically just have a dialogue box where the user could manually tell the reader from which page number it should start counting?

So instead of having the reader try and determine which pages were which itself, the user could just manually tell it something along the lines of ‘Page 1 = Page 120’ and then the reader could just adjust its page counting up by the necessary amount (so start counting pages from ‘120’ instead of from ‘1’).

Hope that makes sense! :smile:

(On a side note, it’s absolutely fantastic to see developers being so responsive to user, and it’s been great seeing Paperpile develop over time - I’m really looking forward to the future of the app, it just seems to keep getting better and better!)


#4

I second this. It’s kind of essential. Without it it’s not possible to use PaperPile for annotations as I won’t be able to cite any notes I’ve made correctly when I come back to the paper to write up an article after a few months.

It almost makes the annotations redundant for serious research purposes.

If it’s hard to implement might a possible work around be to allow the user to edit the page number in the annotation sidebar? At the moment this doesn’t seem possible.

Nevertheless, this feature is quite exciting.


#5

I should add that the ZotFile add-on for Zotero does this excellently. It’s the only implementation I’ve seen work. Papers also fails. Are there not clues from the Zotfile implementation of extracting annotations from PDFs with the correct page numbers that PaperPile could learn from?


#6

We are looking into that. Thanks for the tip.

I might be wrong and @andreas or @jason might correct me. But I think we’ve implemented that feature and it will be part of an upcoming update.


#7

@stefan That is true, we have gotten the basics working for a future update, although in its current implementation it only affects future annotations. We hope we can change that before the release.


#8

When is this update planned? I’d rather wait until it’s ready before I start annotating.


#9

We don’t share our internal schedules for updates and new features any more. That turned out to be a bad idea because things change all the time.

All I can say at this point is that there is a very big update coming up for the PDF viewer early next year and it should be part of that.


#10

Thank you. Are the any plans to improve the responsiveness of the PDF viewer or is this a Chrome problem? For instance, PDFs are quite jerky when scrolling. This does not happen with the same PDFs in Preview or PDF Expert. I’m using a 2015 rMBP with 16 GB RAM so I don’t think it’s a hardware problem.


#11

For me, the jerkiness tend to occur with text-dense and large pdf documents. I’m running Chrome on a Win 8.1 machine with only 2 GB RAM, Perhaps, there’s some sorts of server-computed rendering for display in the metapdf?


#12

@ngg

We will investigate. Do you see the same problems when you open the PDF file with this viewer?

https://mozilla.github.io/pdf.js/web/viewer.html


#13

yes, a little, but not as obvious as the metapdf viewer. I think the jerkiness tend to occur with one particular 2-column article (20 pages, 1.23 MB PDF) that I have. Most of the time, things are ok. Anyway, I tried out this article in the mozilla link and compare to metapdf. The metapdf experience is still more jerky - chunks of text appearing after a brief pause and in a left-column-then-right-column order. A transient “Loading” watermark often appeared during this 1-2 second period. Scrolling was also not as smooth or responsive as the mozilla link. Scrolling back to previous pages also trigger the “loading”. Not sure why, but fortunately, it’s only this PDF at the moment. My current Chrome is Version 55.0.2883.87 m (64-bit).


#14

Are accurate page numbers for annotations still planned for release? Anxiously and eagerly waiting.


#15

Wanted to share an idea. A simple way of implementing this would be to include in MetaPDF a table of page number mappings. The user could open the table in a dialog box and add rows that describe relationships between the PDF page numbers and the source page numbers. Something like this:

PDF <–> Source
[1-4] <–> [97-100]
[5-20] <–> [200-215]
[21] <–> [329]

Then when the user exports notes, this table can be cross-referenced to correctly identify the source page that the annotation belongs to. I don’t know much about PDFs, but maybe this is information that can be embedded in the PDF itself?


#16

Yesterday’s update to the PDF Annotator included many fixes and improvements including the use of nominal page numbers (where available) for the annotations pane.