Specify custom BibTeX key formats for entire library

Dear Paperpile team,

One of the first things that become a roadblock when moving from another reference manager to Paperpile is the lack of an option to specifiy a uniform custom BibTeX key format for all the papers in the library. While this may not be a problem when you are writing exclusively in Google Docs, it is definitely a pain when you write in LaTeX and are used to a certain format of BibTeX keys.

For example, I don’t even think before popping in a ~\ref{last name of first author-year-first word of title}, so its a menace to have to find out that all your references are broken later on because the keys don’t match.

So my first request would be introduce a way for the users to specify a custom BibTeX format for all the papers in their libraries. For the users who would seldom write in LaTeX the key can be any default format.

Thanks for a lovely piece of software! :smile:

6 Likes

Yes I would like this also. I am writing my PhD thesis in markdown and have a lot of references. Having a consistent way of referencing would simplify the process. I prefer the keys like @FirstauthorYear. Now I have to go through all the references I have already written from before I started using Paperpile and update them.

1 Like

I have started inserting citations into yet another paper and running into the same BibTeX key format problem. I am wondering if you have had the chance to consider the request for default personalized BibTeX key formats? I know paperpile is very very different from citeulike - but they have this implemented. Do you think we could have something similar? Thanks so much!

2 Likes

Sorry I did not reply earlier, I’ve missed this thread. Thanks for bringing it up again. It’s an important point and we definitely want to add this at some point. It’s a frequently requested feature.

4 Likes

+1 for this, it really would be very useful. However, LastNameYear doesn’t always work, for example if the last name of the author is very common, or the author is very prolific you may need a bit more information in the citation key. The format I usually use is a bit more complex:

  • LastNameYY for single author papers
  • LastName1LastName2YY for dual author papers
  • LastName+YY for more than two authors
  • LastName?"+"YY[a-z] (e.g. “Smith+99a” or “JonesSmith14b”) to avoid duplicate citation keys

I wouldn’t even mind so much if I had to change my citation key format, so long as Paperpile enforced something that is simple, consistent and avoids duplicate keys. In an ideal world though, it would be nicest if there was some clever way for a user to specify the regular expression they wish their citation keys to match.

1 Like

The pattern would not be that problem. Actually we used to have a very early prototype which had a very sophisticated pattern mechanism. But the problem is the de-duplication.

With a simple de-duplication random key like we add now we can generate the citation key deterministically just from the reference data. If we allow Smith99a and Smith99b we need to check potentially 10k other papers if a user changes the author. But that would not even be the hardest problem. The problem is how to assign “a” and “b”.

What if I delete Smith99a from my library and then add a new Smith99 will it become Smith99a again or Smith99c because there is already a Smith99b. No matter what we do, we will get e-mails from people telling me our software is broken :wink:

Because if we re-assign the suffixes the citations in their papers will not match anymore.

If we keep the assignment forever people will complain that there is a Smith99b without a Smith99a.

We had this for import of BibTeX files. We re-assign everything to make sure we have consistent keys without duplicates. That’s a decision we made but we can see the other side where people lose their well curated BibTeX keys.

Anyway, we hope to find a solution that will allow us to offer customizable keys which is efficient and gets all the weird edge cases right.

Any input is more than welcome!

Hi @stefan that’s a really interesting UX problem! I think any solution would add some extra complexity in several places, but personally I would resolve the issue in this way:

  1. Add a check-box to every citation and similar label- and folder-level check-boxes to say “set citation key manually”, with a suitable “what is this?” to explain it. When the check-box is ticked, allow the user to edit the citation key in a text box, in the same way they edit paper titles and other data.
  2. When a user imports a new reference set the key automatically. If the reference gets moved to a label or folder with the “set key manually” setting on then keep the automatic key but check the “set key manually” check box.
  3. When you import a new reference, query the references the user already has. If there are duplicate keys then set the keys automatically as above, but add a non-modal warning box to the top of the Paperpile front page next time they visit, saying something like “could not disambiguate citation keys … view suggested keys… what is this?” With “suggested keys” and “what is this” being hyperlinks. The “suggested keys” link should take the user to a page with the ambiguous references, so they can check the “manually edit citation key” box and change the suggested keys. Each suggested key should have a short sentence next to it that explains how the key was chosen, so you can say “Smith99b was chosen for this reference because you used to have a reference Smith99a which referred to Long Paper Title and was deleted on 01/01/2015”. This might mitigate some of those pesky emails.
  4. A manually edited citation key should never be changed automatically, unless the user un-ticks the check-box.
  5. Add a setting to allow the user to set the regex for the automatically generated key as well.

Not sure if that is useful to you or not, but it makes sense from my perspective as a user.

As it happens I’ve come up against this problem myself, I’ve just started using Paperpile, right in the middle of writing a paper, so I now can’t use my old LaTeX citations with the paper I’m writing. Probably the easiest thing for me would be to write my own citation key translator and run it every time I download a .bib. That won’t make it easy to share a folder with collaborators and it will break as soon as you add integration with Overleaf! I think the lesson here is that there are never any easy answers with end-user software :frowning:

1 Like

I would like to have user-defined cite keys on some publications, paperpile-generated ones on others. My workflow is exporting into Bibtex and then using a tex editor to write. Some papers I only cite in a specific manuscript, but other papers are central to my research and I cite them over and over. For these papers I like to have specific mnemonics.

I’m using Zotero at the moment, but looking into Paperpile and Docear at the moment. The Zotero plugin Better Bib(La)TeX has great support for generating user friendly, unique citekeys. See the following page for details: BBT Citation Keys.

Here are the bullet points:

Standard Zotero behaviour:

  • If a non-unique key is generated, which one gets postfixed with a distinguishing character is essentially non-deterministic.
  • The keys are always auto-generated, so if you correct a typo in the author name or title, the key will change
  • You can’t see the citation keys until you export them

Better Bib(La)TeX behaviour:

  • Set your own, fixed citation keys
  • Stable citation keys, without key clashes. BBT generates citation keys that take into account other existing keys in your library in a deterministic way, regardless of what part of your library you export, or the order in which you do it.
  • Generate citation keys from JabRef patterns

Of all the citation key management approaches I have tried, I find this the best way for me:

  • Creating a unique citation key for each paper I add to my reference manager is something that should be taken of automatically but with me as a user being able to give instructions to the automation (any existing fields can be used during generation).
  • If I have set a custom key, the system should remember this and not overwrite it.
1 Like

It has been an eternity since this issue was first reported, and an eternity since anyone has commented on it. Any update on whether/when this might get implemented? I’m a recent convert from Mendeley, and sorely miss being able to control how citation keys get formed.

1 Like

Thank you for reviving this topic. I realize it’s been a long time but it seems we haven’t been able to prioritize this topic. As we continue developing in several directions, we are approaching a point where we’ll be able to focus on improvements and implementation of features. This one will certainly be among them.

1 Like

As Vicente said, that has been on the backburner for quite some time but we have not forgotten about it. There are quite a few other features related to BibTeX/LaTeX (like automatically exporting BibTeX files, Overleaf integration) which is on our roadmap.

1 Like

Here is a pretty simple solution that might work – don’t treat the keys as uuid for your database. Instead, generate new keys using the user’s template string if one is specified, at the time when you output the bib entries for each item.

If multiple entries are output, just dedupe within that list of publications. If the keys overlap with other entries the user already have in their LaTeX project, that is their problem not yours.

The citation key format I have been using is the Google Scholar one: kaelbling1993learning. where learning is the first non-trivial work of the title. This dedupes the keys enough that I never have problem with conflicts. Note that it is also all lower-case, which is a nice thing for your template engine to accommodate.

`${firstAuthor.lastName.lower}${year}${title.firstWord.lower}`
2 Likes

I’m not sure what the best approach to this problem is, but my issue is that BibTeX keys can differ for the same reference if I export it at another time. For example, when I started writing my thesis, a certain reference was exported as Oeppen2002-sf but now I am redrafting and the same reference is exported as Oeppen2002-zd.
The earlier comment about whether Smith99b becomes Smith99a if b gets deleted: please don’t make them change automatically. (Maybe it could be a manual option that someone could change if they wanted to.)
Now I am making a new Paperpile folder for the references I am definitely using in my thesis + new references added as I redraft, but I am going to have to mix new exported references with copied old exported citations in order to avoid problems with references in the text that I already wrote a couple years ago.