User accessible log, more verbose real time reporting on the status of synchronisation of Papers, more reliable search

Hi team,

These are suggestions for the new Paperpile/feedback on the Beta:

  1. User-accessible log. It would be really useful to know the status of what’s happening in the background for a variety of reasons. Most apps have the option to do verbose logging to assist in debugging problems - such an option could be very useful to assist in ironing out bugs in the Beta version.

  2. More verbose real-time synch status (with descriptive labels) in the My Library section when changes have been made en masse. This could indicate things running correctly without the hassle of messing about reading log files. It wouldn’t provide the kind of detail that verbose logging could provide, but seeing a handful of metrics reported in detail in real time would be useful.

  3. Better use of memory, storage, and multi-core. I’m running Paperpile on systems that have upwards of 24 cores, PCIe SSDs, and 64GB of RAM - yet somehow search is very slow and caching doesn’t seem to be happening/working.

  4. Reporting on the status of PDFs that are unsearchable. Some academic journals and libraries provide PDFs scanned as images for old documents. Other times there can be stupid forms of anti-piracy prevention that stop the documents from being searchable. This is a massive issue if one is searching their own Paperpile database and receiving no hits as it makes one think it’s only because there are no matches (and not because vital documents are inaccessible to the search function).

  5. Search indexing status is unclear and unreliable. Rapidly being able to search my document pile is vital. I have some 16,000 papers - in Google one can do a keyword search through 16,000 PDFs in under a second. Paperpile, however, adds vital metadata to each file making it my preferred method to comprehensively search all of my papers. Presently, however, the search functionality in the Beta is extremely variable in its performance. Some days it will be amazing, totally spot on, other days it will be hot garbage, verging on useless. Transparency as to why this is so (is it crawling the files?) would be a good middle ground assuming that indexing cannot happen instantly upon upload.

  6. Ability to force re-scan/re-identification of previously imported PDFs. I have a large number of PDFs that we imported pre-Beta. It would be ideal if I could force the re-scan of these documents (in full) for ISBN, etc.

Hope this helps

Thank you for your helpful feedback @ajn. We will discuss points 1-3 and 6 as UX requests internally. On point 4, PDFs that are scanned are not currently searchable but the team plans to fix this issue.

And thanks for alerting us to the issues you are having searching your library. The team are also implementing a fix for searching in large libraries that should be available in the next release. But it would be still helpful for us to have more information on the search issue you are having - is it time performance (too slow) or is it a problem with search result relevance (it is not showing relevant references)? Or do you find that after uploading PDFs, it takes time for them to appear in the search results? Let me know.

noted. many thanks for the update.

The issuess appear to be limited to:

  1. The time it takes after upload for the contents of a PDF to appear in search results.
  2. Occasionally, I know there are Papers/PDFs that contain the term I’m searching for (and have been uploaded for a week or more), yet the search does not locate the term in the relevant PDF(s).

I suspect point 2 is because of some PDFs are not able to be indexed as searchable (due to being images or having some anti-piracy features). Hence I call out transparency on what is in the index as being the critical functionality that appears to be missing.

That is, it is broadly acceptable for a delay in indexing (hopefully a short one) - that kind of functionality I see as being relevant to the cost/benefit assessment of the Paperpile service. However, unreliable indexes verge on making the indexing service unusable in a high-pressure or academic setting where one should be able to rely on it to track key terms and identifiers.

Thank you for your clarification of the search issues, @ajn. There are known issues with indexing jobs in the beta and the team are working on fixing them.