Scoring Elfeed Entries

(NB. I drafted this post a few months ago, when I first posted elfeed-score to MELPA & only got around to publishing it now)

In a previous post I talked about the process of moving from Feedly to Elfeed for my RSS needs. It wasn't long after that that I found myself wanting to surface particularly interesting items in the search view. I've used Gnus in the past, and found myself wishing for something like its scoring feature in Elfeed.

Poking around elfeed-search.el yielded:

(defcustom elfeed-search-sort-function nil
  "Sort predicate applied to the list of entries before display.

This function must take two entries as arguments, an interface
suitable as the predicate for `sort'.

Changing this from the default will lead to misleading results
during live filter editing, but the results be will correct when
live filter editing is exited."
  :group 'elfeed
  :type '(choice function (const nil)))

I reasoned that all I needed to do was setup a function that sorts entries by score (and then by date); But how to get a score into each entry?

I found this article from John Kitchin where he did something related, though a bit different:

(defun score-elfeed-entry (entry)
  (let ((title (elfeed-entry-title entry))
        (content (elfeed-deref (elfeed-entry-content entry)))
        (score 0))
    (loop for (pattern n) in '(("alloy" 1)
                               ("machine learning\\|neural" 1)
                               ("database" 1)
                               ("reproducible" 1)
                               ("carbon dioxide\\|CO2" 1)
                               ("oxygen evolution\\|OER\\|electrolysis" 1)
                               ("perovskite\\|polymorph\\|epitax" 1)
                               ("kitchin" 2))
          if (string-match pattern title)
          do (incf score n)
          if (string-match pattern content)
          do (incf score n))
    (message "%s - %s" title score)

    ;; store score for later in case I ever integrate machine learning
    (setf (elfeed-meta entry :my/score) score)

    (cond
     ((= score 1)
      (elfeed-tag entry 'relevant))
     ((> score 1)
      (elfeed-tag entry 'important)))
    entry))

(add-hook 'elfeed-new-entry-hook 'score-elfeed-entry)

Scoring elfeed articles, John Kitchen

Elfeed has a hook that's called for each new entry– he leverages that to add a function that matches each new entry's title & content against certain keywords & computes a score based on the results. He tucks the score away in a bit of custom meta-data for the entry (:my/score). However, after that he applied neither, one, or both of the custom tags 'relevant and 'important to the new entry (depending on how high the score was); elsewhere in the article he leverages 'elfeed-search-face-alist to highlight entries so tagged in the search buffer. That wasn't quite what I wanted, but that was OK: it showed me how to get the score into each new entry. I now had the building blocks for a solution.

The documentation for elfeed-search-sort-function didn't explicitly define the signature it required, it just says that the function should be suitable for use with sort. A little more digging showed me what the author meant:

(defun elfeed-search--update-list ()
  "Update `elfeed-search-filter' list."
  (let* ((filter (elfeed-search-parse-filter elfeed-search-filter))
         ...)
    ...
    ;; Determine the final list order
    (let ((entries (cdr head)))
      (when elfeed-search-sort-function
        (setf entries (sort entries elfeed-search-sort-function)))
      ...)))

OK– so what is the signature of the function expected by sort?

sort is a built-in function in ‘C source code’.

(sort SEQ PREDICATE)

  Probably introduced at or before Emacs version 16.

Sort SEQ, stably, comparing elements using PREDICATE.
Returns the sorted sequence.  SEQ should be a list or vector.  SEQ is
modified by side effects.  PREDICATE is called with two elements of
SEQ, and should return non-nil if the first element should sort before
the second.

There we have it– I wrote:

(defun elfeed-score-sort (a b)
  "Return non-nil if A should sort before B.
`elfeed-score' will substitute this for the Elfeed scoring function."

  (let ((a-score (elfeed-meta a elfeed-score-meta-keyword
                              elfeed-score-default-score))
        (b-score (elfeed-meta b elfeed-score-meta-keyword
                              elfeed-score-default-score)))
    (if (> a-score b-score)
        t
      (let ((a-date  (elfeed-entry-date a))
            (b-date  (elfeed-entry-date b)))
        (and (eq a-score b-score) (> a-date b-date))))))

That's sorting. Next: scoring entries.

My idea was to approximate the Gnus scoring file format as closely as possible. This was familiar, and the format of a Lisp S-expression had the advantage of being easy to parse via read-from-string. I added a version element out of hard experience. I considered a number of entry attributes against which one could match, but decided to start simple: content, feed, & title. Analagous to the mark & expunge rules in Gnus, I added the option to mark all articles lower than some threshold as read.

(defun elfeed-score--parse-score-file (score-file)
  "Parse SCORE-FILE.
Internal.  This is the core score file parsing routine.  Opens
SCORE-FILE, reads the contents as a Lisp form, and parses that
into a property list with the following properties:
    - :content
    - :feeds
    - :mark
    - :titles"

  (let ((raw-entries
         (car
          (read-from-string
           (with-temp-buffer
             (insert-file-contents score-file)
             (buffer-string)))))
        mark titles feeds content)
    (dolist (raw-item raw-entries)
      (let ((key  (car raw-item))
            (rest (cdr raw-item)))
        (cond
         ((string= key "version")
          (unless (eq 1 (car rest))
            (error "Unsupported score file version %s" (car rest))))
         ((string= key "title")
          (dolist (item rest)
            (let ((item-plist (list
                               :text  (nth 0 item)
                               :value (nth 1 item)
                               :type  (nth 2 item)
                               :date  (nth 3 item))))
              (unless (member item-plist titles)
                (setq titles (append titles (list item-plist)))))))
         ((string= key "content")
          ...)
         ...)))
    (list
     :mark mark
     :feeds feeds
     :titles titles
     :content content)))

So, after calling this function, I've got a list of rules (to match against feeds, entry titles, entry content, &c), which I store in package-prefixed global variables. All that was left was to apply the rules in a new entry hook:

(defun elfeed-score--score-entry (entry)
  "Score an Elfeed ENTRY.
This function will return the entry's score, udpate it's meta-data, and
udpate the \"last matched\" time of the salient rules."

  (let ((title   (elfeed-entry-title entry))
        (feed    (elfeed-entry-feed  entry))
        (content (elfeed-deref (elfeed-entry-content entry)))
              (score   elfeed-score-default-score))
    ;; score on the entry title
          (dolist (score-title elfeed-score--score-titles)
            (let* ((match-text (plist-get score-title :text))
                         (value      (plist-get score-title :value))
                         (match-type (plist-get score-title :type))
             (got-match (elfeed-score--match-text match-text title match-type)))
        (if got-match
            (progn
              (elfeed-score--debug "'%s' + %d (title)" title value)
                          (setq score (+ score value))
                          (plist-put score-title :date (float-time))))))
    ;; score on the entry feed
    ...
    ;; score on the entry content
    ...
    (setf (elfeed-meta entry elfeed-score-meta-keyword) score)
          (if (and elfeed-score--score-mark
                         (< score elfeed-score--score-mark))
              (elfeed-untag entry 'unread))
    score))

Functionally, that's really all there was to it.

The next step was packaging this up for submission to MELPA. If you have never submitted your work there, be forewarned: they have a lengthy set of coding standards & they enforce them. This process involved:

The major change from my first, naive implemetation was in supporting the "enable/disable" features recommended in the Emacs Lisp Conding Conventions. Once that was done, the process of submitting to MELPA was straightforward:

Within a few days, my PR was reviewed, accepted & merged.

03/22/20 19:56