Full text search in PDFs

jonesjeffr · May 27, 2018, 7:43pm

This will never happen

stefan · May 31, 2018, 5:05pm

I’ve written some words on the progress of things in another thread. Improved search of all kind (your references, full-text of PDFs and external databases) is part of our long term strategy. Also Jirka posted a quick intro, he is our new database and backend engineer working on search infrastructure.

http://forum.paperpile.com/t/is-paperpile-development-dead/2897/19

http://forum.paperpile.com/t/is-paperpile-development-dead/2897/24

Peyman_Obeidy · June 15, 2018, 4:24am

I would like to see Full text search too

Edward_Richards · September 26, 2018, 9:28pm

I have been using pdf managers for some years. I was just about to start with Paperpile, when I found this thread. Full text searching is the most basic function for a researcher. While you have many other sexy projects underway, all pale in importance to basic text searching. I encourage you to do some focus groups with folks who use these products to better match your development priorities with the needs of your potential customers.

Andrey · September 26, 2018, 10:11pm

@Edward_Richards exactly my thoughts from Sept 2014!..

ChemBioUSC · October 24, 2018, 7:06am

I don’t think it’s fair to say that the software isn’t being updated. That said, it’s true that there are some functionalities that have been for a long time in the pipeline. Full text search would be my top one pick, but so it was the IOS version, and they are delivering it. I’m happy with Paperpile and do not plan to switch.

Andrey · October 24, 2018, 1:52pm

I second that, and I do not plan to switch either (btw, I am not aware of any alternatives for managing references in Google Docs). But at the same time, similar to what I said 2-3 years ago, it is still very unclear how Paperpile defines priorities for their development.

ChemBioUSC · October 24, 2018, 3:33pm

Yes, there’s stuff that’s been on the making for a long time. Still, I’ve tried many other alternatives (Mendely, Zotero, ReadCube, Papers, Sente, Bibdesk, EndNote) and I think that Paperpile is the best option.

John_Curry · December 1, 2018, 10:14pm

It is great to hear that advanced search is on the roadmap, as search and a word plug-in are still my top two remaining features.

Would any of the staff be able to provide any kind of updates on this? @jirka

Napier · February 5, 2019, 4:48pm

Any word on the development schedule of full text search? Along with multi-document annotation mapping and export this is one of the key features that’s really holding Paperpile back.

jirka · February 6, 2019, 9:15am

Napier, please, do not abuse the flagging feature.

We do take the feedback into account, which is why we have the Mobile App beta and Word Plugin beta running. We are also updating our infrastructure behind the scenes to make sure we can support all the features we would like to implement. When there is any news to share in regards to the fulltext search, we will do so here on the forums.

Napier · February 6, 2019, 12:20pm

I apologise and stand corrected – I did not realise that flagging such a long running topic for your attention and requesting an update from Paperpile constituted anything other than common sense. No developer response has been provided since October '18 and full-text search has been discussed but not developed since 2014. Being able to search the indexed content of a collected research library is of key importance to any conceivable research and writing workflow.

I am glad that you are laying the infrastructure groundwork for future improvements to Paperpile but I’m sure that you can also understand the growing impatience of many of your long standing paying users who rely on your software academically and professionally. I don’t intend to be overly critical – Paperpile is very good – it’s just that there are some glaring feature gaps that prevent it from being brilliant.

Andrey · February 6, 2019, 2:46pm

@Napier - well said! The lack of this feature is the reason I need Mendeley to search my Paperpile PDF library. For whatever reason, and as you should be able to see from the very start of this thread, Paperpile did not consider this feature as important, and never prioritized its development. Unfortunately so.

Karen_Breakey · February 25, 2019, 5:52pm

Here’s a workaround to search PDFs stored in Paperpile.

Google Drive contains a Paperpile folder with all your documents. Instead of opening the PDF in Paperpile on your browser (which takes you to Paperpile’s PDF reader without search capacity), try opening them from the Paperpile folder (Google Drive -> Paperpile -> search for document and click open). Then you get the option to choose a PDF reader other than Paperpile’s. For example, I use Preview, which allows me to search any individual PDF.

From Finder on a Mac, I can also focus a text search on just the PDFs in my Paperpile folder. Working for me so far.

Kerim · February 26, 2019, 10:35am

The beta PDF annotator from Paperpile does offer search. You can turn this on from the settings menu, under “browser integrations” > PDF Viewer > “viewer with annotations (beta)”.

Karen_Breakey · March 2, 2019, 11:19pm

Wow, thanks. Seems like it works great. (Software developer husband says thumbs up!). I think others would appreciate this too.

KCF · March 24, 2019, 10:44pm

My workaround, and I realize this might not work for most people, is to copy the papers I need to search onto my laptop or cloud storage, tag the papers with a project specific tag and do the search from there. This does work from multiple different laptops and computers, but it doesn’t allow a search on files that are in Paperpile but not in a folder that is obviously related to my current project.
If you have an exact phrase you are searching for Google Scholar can sometimes be helpful.

I agree this is an important feature, and would very much like to have it.

Daniel · May 12, 2019, 5:06pm

I’ve now tried multiple options for a full-text search workaround:

I used DocFetcher (portable, free, open source) to index the Paperpile folder in my Google Drive.
With that I get super fast full text search, but the preview is only plain text, which is ok, just doesn’t look nice. Sometimes you get strange PDF mumbo jambo effects like doubled text.

Foxit Reader and Acrobat DC also also allow to perform a text string search on a folder, but that can take a long time since these apps do not create an index, Acrobat DC Pro offers this feature I think, but Adobe’s pricing is bs.
The nice effect here is that you get a collapsible list of preview snippets and when you click on them they get opened directly in the PDF at the right position. Which would be the dream feature for Paperpile, especially because I have this nice free white space on the right in Paperpile

Bill · September 5, 2019, 9:02pm

Thanks Daniel - this is a great workaround. I really can’t believe this feature is still completely missing and is seemingly being ignored…18+ months since any type of developer feedback on the issue

stefan · September 26, 2019, 8:18am

I understand how useful this feature would be and workarounds are never as good as comprehensive, fast, and accurate search right in our app. That’s still our goal and we are far from ignoring the issue.

Originally, the files were only stored in Google Drive which made it impossible for us to efficiently index them. That’s why we spent the last year or so rewriting our backend and transferring tens of millions of PDFs to a new backend infrastructure which will allow us to search the files eventually. But for technical reasons you will see some other search improvements first before the full-text will be added.