Spurious word "abstract" in the abstract field


#1

This is only a tiny cosmetic issue, but I guess it would be easy to fix: When adding papers to Paperpile, the word “abstract” often gets included as the first word of the abstract field (often without a space before the real first word of the abstract). It would be nice if Paperpile could automatically strip out that extra word when adding a paper.


#2

I have not seen that but given the data quality we get from some sources it does not come as a surprise. Do you have any concrete examples?

The only problem I see is that if we implement that somebody will come to this forum and say the first word of his or her abstract is gone which should read “Abstraction of X…” or “Abstract ideas are…”


#3

Yes, searching for the word “abstract” in my Paperpile database produces dozens of examples like this. These are the most recent examples of papers I’ve imported where this has happened:
http://www.journals.uchicago.edu/doi/abs/10.1086/686793
http://www.tandfonline.com/doi/full/10.1080/19439342.2016.1206607
http://www.tandfonline.com/doi/abs/10.1080/00220388.2011.625408

In the first case, I can see that the publisher has included the word “Abstract” at the beginning of the abstract field in the RIS file, so its understandable that Paperpile imports that. In the other two cases, the abstract is not included in the RIS file. In the second case, the word ABSTRACT is included in capital letters in the Paperpile field, as it appears on the webpage for that paper.

As I said, this is only a very minor niggle, and I do see that there would be a problem when the abstract legitimately begins with the word “Abstract”. I guess that it would be possible to test whether “Abstract” is followed either by an upper-case letter or by a space and then an upper-case letter, and to remove it in either of those situations…