View Source LastCrusader.Micropub.PostTypeDiscovery (Last Crusader v0.11.0)

Indieweb Post Type discovery implementation

see https://indieweb.org/post-type-discovery

Post Type Discovery specifies an algorithm for consuming code to determine the type of a post by its content properties and their values rather than an explicit “post type” property, thus better matched to modern post creation UIs that allow combining text, media, etc in a variety of ways without burdening users with any notion of what kind of post they are creating.

The Post Type Discovery algorithm ("the algorithm") discovers the type of a post given a data structure representing a post with a flat set of properties (e.g. Activity Streams (1.0 or 2.0) JSON, or JSON output from parsing [microformats2]), each with one or more values, by following these steps until reaching the first "it is a(n) ... post" statement at which point the "..." is the discovered post type.

  1. If the post has an "rsvp" property with a valid value,
  Then it is an RSVP post.
  2. If the post has an "in-reply-to" property with a valid URL,
  Then it is a reply post.
  3. If the post has a "repost-of" property with a valid URL,
  Then it is a repost (AKA "share") post.
  4. If the post has a "like-of" property with a valid URL,
  Then it is a like (AKA "favorite") post.
  5. If the post has a "video" property with a valid URL,
  Then it is a video post.
  6. If the post has a "photo" property with a valid URL,
  Then it is a photo post.
  7. If the post has a "content" property with a non-empty value,
  Then use its first non-empty value as the content
  8. Else if the post has a "summary" property with a non-empty value,
  Then use its first non-empty value as the content
  9. Else it is a note post.
  10. If the post has no "name" property
    or has a "name" property with an empty string value (or no value)
  Then it is a note post.
  11. Take the first non-empty value of the "name" property
  12. Trim all leading/trailing whitespace
  13. Collapse all sequences of internal whitespace to a single space (0x20) character each
  14. Do the same with the content
  15. If this processed "name" property value is NOT a prefix of the processed content,
  Then it is an article post.
  16. It is a note post.

Quoted property names in the algorithm are defined in h-entry.

Summary

Functions

Discover the post type according to the official algorithm. Can be

Determine whether the name property represents an explicit title.

Types

post_type()

@type post_type() ::
  :note
  | :article
  | :bookmark
  | :rvsp
  | :in_reply_to
  | :like_of
  | :video
  | :photo

Functions

discover(post)

@spec discover(any()) :: post_type()

Discover the post type according to the official algorithm. Can be:

  • :note
  • :article
  • :bookmark
  • :rvsp
  • :in_reply_to
  • :like_of
  • :video
  • :photo

name_is_title?(name, content)

@spec name_is_title?(String.t(), String.t()) :: boolean()

Determine whether the name property represents an explicit title.

see one python implementation because the official algo explanation is not clear at all (to me). I took documentation (and unit tests) from it:

Typically when parsing an h-entry, we check whether p-name == e-content (value). If they are non-equal, then p-name likely represents a title.

However, occasionally we come across an h-entry that does not provide an explicit p-name. In this case, the name is automatically generated by converting the entire h-entry content to plain text. This definitely does not represent a title, and looks very bad when displayed as such.

To handle this case, we broaden the equality check to see if content is a subset of name. We also strip out non-alphanumeric characters just to make the check a little more forgiving.