** DONE UOMF: Reference Management with Org Mode        :blog:typography:software:research:pim:emacs:
CLOSED: [2015-12-26 Sat 12:55] SCHEDULED: <2015-12-26 Sat>
:PROPERTIES:
:CREATED: [2015-12-26 Sat 11:02]
:ID: 2015-12-26-reference-management-with-orgmode
:END:
:LOGBOOK:
- State "DONE"       from "DONE"       [2022-08-08 Mon 20:06]
- State "DONE"       from "DONE"       [2021-06-16 Wed 13:27]
- State "DONE"       from "NEXT"       [2015-12-26 Sat 12:55]
:END:

- Updates
  - 2019-09-25: added to [[id:2019-09-25-using-orgmode][blog series "Using Org Mode Features"]]
  - 2021-06-16: comment on =oc.el=
  - 2022-08-08: A Workflow by Koustuv Sinha

Please do read [[id:2019-09-25-using-orgmode][my "Using Org Mode Features" (UOMF) series page]] for
explanations on articles of this series.

While I was a [[https://en.wikipedia.org/wiki/Personal_information_management][PIM researcher]] at [[http://TUGraz.at][Graz University of Technology]], I was
using [[https://en.wikipedia.org/wiki/Org-mode][Emacs Org-mode]] for managing references to white papers and books.

My starting point was [[https://tincman.wordpress.com/2011/01/04/research-paper-management-with-emacs-org-mode-and-reftex/][this description of a workflow]]. I added many
features to the workflow and [[https://github.com/novoid/extract_pdf_annotations_to_orgmode][described it on GitHub]].

John Kitchin from Carnegie Mellon University ([[http://kitchingroup.cheme.cmu.edu/blog/category/org-mode/][blog]], [[http://kitchingroup.cheme.cmu.edu/blog/category/org-mode/][Twitter]], [[https://github.com/jkitchin][GitHub]])
is an awesome Org-mode user and contributor. Every workflow he has
implemented with Org-mode is a great source of inspiration for many
similar workflows I use. If you're into teaching or doing research,
you definitely have to follow his work!

So John has implemented reference management with Org-mode as well:
[[https://github.com/jkitchin/org-ref][org-ref]] got a major update these days. Because of his latest update, I
write this blog post to explain my method, his method, and compare the
approaches. This should give you a good starting point on your
decision how you are going to use Org-mode for your reference
management workflow.

*** My Reference Management Workflow
:PROPERTIES:
:END:

My method for managing informations on papers is based on the premise
that I write papers in (pdf)LaTeX directly and not with Org-mode. It
consists of one Org-mode file holding one heading per reference with
none or more sub-headings. Here is one example which has a sub-heading
for the abstract:

: *** Voit2012 - TagTrees: Improving Personal Information Management using Associative Navigation   :PIM:
: :PROPERTIES:
: :CREATED: <2012-09-17 Mon 17:48>
: :ID: Voit2012b
: :END:
:
: [[bib:Voit2012][Voit2012.bib]]
: [[pdf:Voit2012][Voit2012.pdf]]
:
: **** Abstract
:
: #+BEGIN_QUOTE
: This dissertation gives an overview of research related to Personal
: [...]
: #+END_QUOTE

I am using my own ~reftex-set-cite-format~ for adding new references
to my collection. It's described on [[https://github.com/novoid/extract_pdf_annotations_to_orgmode#bonus-emacs-setup][GitHub]] and you can find [[https://github.com/novoid/dot-emacs][my
currently used Emacs setup on GitHub]] as well. To add a new reference
heading in Org-mode I only have to manually create and write a Bibtex
file, press ~C-c ) h~, and select the new reference.

Each reference has one Bibtex file (~Voit2012.bib~), one PDF file
(~Voit2012.pdf~), and one optional PDF file containing PDF notes
(~Voit2012-notes.pdf~) in the same folder. To link to any of them, I
press ~C-c )~ with ~b~, ~r~, or ~p~ for inserting links to a Bibtex
file, Org-mode reference, or PDF file. To make those links work, I
added them to ~org-link-abbrev-alist~ ([[https://github.com/novoid/extract_pdf_annotations_to_orgmode#bonus-emacs-setup][described here]]).

Occasionally I want to have one big Bibtex file holding all
references. To get it generated, I am using following script:

#+BEGIN_SRC sh
#!/bin/sh
cd ~/archive/library && \
   rm references.bib && \
   cat [A-Z]*bib > references.bib
#end
#+END_SRC

Since I was writing my papers in LaTeX directly (no Org-mode export),
I wrote two handy scripts to support my workflow with references: I
was using references in my TeX file like ~\cite{Voit2012}~, compiled
the document, and got warnings on missing references since I did not
use any Bibtex file yet. Then I invoked a shell script file which
parses the LaTeX temporary files containing the warnings on missing
references and generates a Bibtex file. This way, I got Bibtex files
which holds only the few currently cited references of this paper in
work.

You can find one version of the [[https://gist.github.com/novoid/7c3bf7360e8471364560][script for Bibtex]] and another version
of the [[https://gist.github.com/novoid/56f25ed1e15dc0524485][script for biber/Biblatex]] on GitHub.

As a very cool bonus, I developed a method to extract PDF annotations.
I was reading research papers on my Android tablet, doing simple
highlighting and writing some remarks as annotations in the PDF file
using RepliGo Reader for Android which is discontinued unfortunately.

However, any other app writing standard PDF annotations to the PDF
file should do. If not, you have to find out how your PDF tool is
storing annotations by looking at the PDF file source directly and
modifying the parsing lines in my script.

Having read a paper, added highlighting of important phrases and
words, I stored the annotated paper like ~Voit2012-notes.pdf~ in my
library directory as mentioned above. With ~C-c ) n~ I can add a
sub-heading to a reference that contains a small babel script. It
executes ~vkextract_annotations_to_orgmode_snippet.sh~ with the
reference. [[https://github.com/novoid/extract_pdf_annotations_to_orgmode][The script]] parses the note file, inserts every word
highlighted and every annotation written directly to my Org-mode file.
This way, my paper summaries were generated automatically and I could
search and find references by keywords directly in Org-mode. How cool
is that?

*** org-ref
:PROPERTIES:
:END:

John's workflow with ~org-ref~ seems to be built with the intention to
write white papers directly in Org-mode and use the export
functionality to get LaTeX or PDF files. He has invested much more
effort in his method than I did with mine. Therefore, I can't describe
his method in great detail as I did above with mine.

You can find the source and documentation of org-ref [[https://github.com/jkitchin/org-ref][on GitHub]].

John made a screencast with eleven and a half minutes of awesomeness
describing the basics of his method:

#+BEGIN_EXPORT HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/JyvpSVl4_dg?rel=0" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
#+END_EXPORT

He also describes the new features of the recent update in ten
minutes:

#+BEGIN_EXPORT HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/2t925KRBbFc?rel=0" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
#+END_EXPORT

As you can see, with drag and drop of PDF files to generate Bibtex
entries, drag and drop [[https://en.wikipedia.org/wiki/Digital_object_identifier][DOI URLs]] to download PDF papers *and* generate
Bibtex entries, his featureset is clearly more advanced than my
method. Just to mention two features of many.

*** Superficial Comparison
:PROPERTIES:
:END:

I have to admit that I did not try out John's method by myself. I just
watched the screencast videos and read some of the documentation
files. Currently, I don't have the necessity of managing references.
In case I have to, I would definitely check out and use John's method.

However, my method does have some advantages to my point of view. My
method supports writing in LaTeX (not Org-mode) a bit more. For
example, my papers contain only the reference file for this specific
paper and not my complete set of references.

As far as I remember, it was mandatory to me to write all papers in a
format that Org-mode LaTeX export was not able to deliver without
substantial additional effort. Org-mode does not export to ACM format.
Org-mode does not export to (mostly) not very well done LaTeX
templates of conference proceedings. Additionally, I always had to
tweak the LaTeX source here and there in order to satisfy restrictions
on space, my stupid level of typographic perfection, or other things I
can't accomplish with Org-mode to LaTeX export. I wonder how John is
able to deal with this annoyance.

Update: John added a comment below which is quite interesting - you
should check it out.

My method uses shell scripts that have to me re-written for users of
Windows systems. This is a clear disadvantage although not for me.

Clearly, it is no big deal to "migrate" the few unique features of my
method to John's workflow. For example, [[https://github.com/novoid/extract_pdf_annotations_to_orgmode][exctracting PDF annotations]]
can be added to John's method without any additional effort at all.
I'd love to have a positive influence on John's method with this blog
entry and my scripts. Maybe you are willing to send him pull requests
with minor improvements here and there. It's always great to have a
bigger community using the same method than tinkering on your small
scripts all by yourself.

If you want to add your opinion or new ideas to this topic, please
leave a comment below!

Update 2021-06-16: There is [[https://soham.dev/posts/org-bibliography/][this blog article with a DIY method]] and
soon, [[https://code.orgmode.org/bzg/org-mode/src/wip-cite-new/lisp/oc.el][oc.el]] is about to introduce some out-of-the-box functionality
for reference management with Org. This looks promising:

: [cite/style/sub-style:global prefix;cite prefix @key1 cite suffix; global suffix]

Examples:

: simple [cite:@low2001]
: simple with locator suffix [cite:@low2001 p.23]
: citet style [cite/text:@low2001]
: multi-cite with global prefix: [cite:see ;@low2001;@mcneill2011]

*** A Workflow by Koustuv Sinha
:PROPERTIES:
:END:

[[https://irreal.org/blog/?p=10725][Irreal]] featured a blog article by [[https://cs.mcgill.ca/~ksinha4/][Koustuv Sinha]] who [[https://www.cs.mcgill.ca/~ksinha4/post/emacs_research_workflow/][is writing about
his paper reading workflow that covers discovering, managing, syncing
and annotating]] using Org-mode and bits from org-ref mentioned above.

This is a really nice workflow and I most probably would start testing
out that one when I do have the requirement of working with research
papers.