Skip to contents

The op3r package was created to bring the novel Open Podcast Prefix Project (OP3) API to R users through an streamlined and straight-forward interface. A growing number of podcasts adopting the Podcastion 2.0 namespace are opting in to the OP3 service as a robust and transparent source of statistics associated with their podcast.

Initial setup

The API endpoints supported by op3r require an API token. To obtain your own set, create a free developer account at https://op3.dev/api/docs and save the token as an environment variables called OP3_API_TOKEN within a project-level or default user-directory .Renviron file. Below is an example (change to match your token value):

OP3_API_TOKEN="abcd123"

Once the package is installed, you can verify if authentication is working as expected:

Case Study

The API endpoints offered by the OP3 service offer the following general capabilities:

  • Assembling metrics pooled across all podcasts utilizing the OP3 service.
  • Metrics associated with a specific podcast using the OP3 service.

The examples below demonstrate each of these perspectives. The podcast used for the single-podcast examples is the R Weekly Highlights Podcast hosted by Eric Nantz and Michael Thomas. First we load additional R packages used for analysis and visualization:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Podcast metadata

Each podcast using the OP3 project is assigned a universally unique identifier (UUID). Many of the functions tailored to specific podcasts rely on this identifier as a parameter. To find this identifier for a given podcast, you can use either of the following methods:

  • Use the op3_show() function and supply either the podcast’s GUID or RSS feed URL for the required show_id parameter:
rweekly_show_rss <- "https://serve.podhome.fm/rss/bb28afcc-137e-5c66-b231-4ffad7979b44"

op3_show(show_id = rweekly_show_rss)
#> # A tibble: 1 × 4
#>   showUuid                         title               podcastGuid  statsPageUrl
#>   <chr>                            <chr>               <chr>        <chr>       
#> 1 c008c9c7cfe847dda55cfdde54a22154 R Weekly Highlights bb28afcc-13… https://op3…
  • Use the op3_get_show_uuid() function with either the podcast GUID or RSS feed URL for show_id:
op3_get_show_uuid("https://serve.podhome.fm/rss/bb28afcc-137e-5c66-b231-4ffad7979b44")
#> [1] "c008c9c7cfe847dda55cfdde54a22154"

op3_get_show_uuid("bb28afcc-137e-5c66-b231-4ffad7979b44")
#> [1] "c008c9c7cfe847dda55cfdde54a22154"

Download Metrics

Within the podcast industry, many hosts and hosting companies gravitate towards download metrics. The op3_downloads_show() function obtains download statistics for the previous 30 days of the time the function is executed (for additional details on how OP3 calculates downloads refer to the official documentation):

# using the OP3 UUID for R Weekly Highlights
show_id <- "c008c9c7cfe847dda55cfdde54a22154"
op3_downloads_show(show_id = show_id)
#> # A tibble: 1 × 5
#>   daysBotDownloads monthlyDownloads weeklyAvgDownloads numWeeks download_data   
#>              <dbl>            <int>              <int>    <int> <list>          
#> 1               30             1972                470        4 <tibble [4 × 2]>

If you are interested in the downloads per week captured in the metrics, you can run the same function with nest_downloads = FALSE:

op3_downloads_show(show_id = show_id, nest_downloads = FALSE) |>
  select(weekNumber, weeklyDownloads)
#> # A tibble: 4 × 2
#>   weekNumber weeklyDownloads
#>        <int>           <int>
#> 1          1             620
#> 2          2             619
#> 3          3             499
#> 4          4             143

Additional download metrics are available based on applications used to play the podcast. The op3_top_show_apps() function obtains metrics over the last three calendar months from the time of execution:

show_guid <- "bb28afcc-137e-5c66-b231-4ffad7979b44"
op3_top_show_apps(show_guid)
#> # A tibble: 33 × 3
#>    app_name          value show_uuid                       
#>    <chr>             <int> <chr>                           
#>  1 Apple Podcasts      778 c008c9c7cfe847dda55cfdde54a22154
#>  2 Overcast            550 c008c9c7cfe847dda55cfdde54a22154
#>  3 Pocket Casts        350 c008c9c7cfe847dda55cfdde54a22154
#>  4 gPodder             339 c008c9c7cfe847dda55cfdde54a22154
#>  5 Podcast Addict      265 c008c9c7cfe847dda55cfdde54a22154
#>  6 YouTube Music       188 c008c9c7cfe847dda55cfdde54a22154
#>  7 Google Podcasts     158 c008c9c7cfe847dda55cfdde54a22154
#>  8 AntennaPod          139 c008c9c7cfe847dda55cfdde54a22154
#>  9 Snipd                93 c008c9c7cfe847dda55cfdde54a22154
#> 10 Unknown Apple App    89 c008c9c7cfe847dda55cfdde54a22154
#> # ℹ 23 more rows

Redirect Logs / Hits

Each of the functions demonstrated above are convenient ways to obtain derived metrics already pre-computed behind the scenes in the corresponding OP3 API endpoints. The common source of information used in each of them are redirect logs (hits), a collection of metadata dynamically generated each time a client such as a podcast application or web browser renders the URL for a podcast episode. To illustrate, here is the most recent log metadata at the time this vignette was compiled:

op3_hits(limit = 1) |>
  glimpse()
#> Rows: 1
#> Columns: 14
#> $ time            <dttm> 2024-05-25 18:25:01
#> $ uuid            <chr> "2525a8c995d14e97b9ff4fb219ef0e2c"
#> $ hashedIpAddress <chr> "9e94063e25414d195e4ebf99aaedf985198b95b5"
#> $ method          <chr> "GET"
#> $ url             <chr> "https://op3.dev/e/traffic.megaphone.fm/ADV1129270641…
#> $ userAgent       <chr> "Spotify/8.9.42.575 Android/33 (LM-G900)"
#> $ range           <chr> "bytes=0-3145727"
#> $ edgeColo        <chr> "ORD"
#> $ continent       <chr> "NA"
#> $ country         <chr> "US"
#> $ timezone        <chr> "America/New_York"
#> $ regionCode      <chr> "KY"
#> $ region          <chr> "Kentucky"
#> $ metroCode       <chr> "541"

On top of these logs being available across all shows, we can filter the logs returned for a particular podcast by setting the url parameter to be either a direct link to a podcast episode, or a version of the URL with a wildcard placeholder * to signal multiple episodes.

How can we obtain these episode download URLs? You certainly could obtain the podcast’s RSS feed and manually inspect each episode entry to find the direct link to an episode. However, another R package called {podindexr} contains many functions to obtain metadata for podcasts registered on the Podcast Index. For example, the episodes_byurl() function takes a podcast RSS feed as input to obtain a collection of metadata associated with up to 1,000 episodes of a podcast. Here is the metadata associated with the last three episodes of the R Weekly Highlights podcast:

# remotes::install_github("rpodcast/podindexr")
library(podindexr)
episode_df <- episodes_byurl(rweekly_show_rss, max = 3)
glimpse(episode_df)
#> Rows: 3
#> Columns: 30
#> $ id                  <dbl> 22601290393, 22315244014, 22028048965
#> $ title               <chr> "Issue 2024-W20 Highlights", "Issue 2024-W19 Highl…
#> $ link                <chr> "https://serve.podhome.fm/episodepage/r-weekly-hig…
#> $ description         <chr> "An aesthetically-pleasing journey through the his…
#> $ guid                <chr> "54574074-0a1f-4951-8f2a-9084678c6604", "8c23a72a-…
#> $ datePublished       <int> 1715756520, 1715152260, 1714548120
#> $ datePublishedPretty <chr> "May 15, 2024 2:02am", "May 08, 2024 2:11am", "May…
#> $ dateCrawled         <int> 1715756613, 1715756613, 1715756613
#> $ enclosureUrl        <chr> "https://op3.dev/e/serve.podhome.fm/episode/99cfd3…
#> $ enclosureType       <chr> "audio/mpeg", "audio/mpeg", "audio/mpeg"
#> $ enclosureLength     <int> 71545543, 71466423, 53758290
#> $ duration            <int> 2955, 2946, 2215
#> $ explicit            <int> 0, 0, 0
#> $ episode             <int> 165, 164, 163
#> $ episodeType         <chr> "full", "full", "full"
#> $ season              <int> 0, 0, 0
#> $ image               <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ feedItunesId        <int> 1527876975, 1527876975, 1527876975
#> $ feedUrl             <chr> "https://serve.podhome.fm/rss/bb28afcc-137e-5c66-b…
#> $ feedImage           <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ feedId              <int> 1062040, 1062040, 1062040
#> $ podcastGuid         <chr> "bb28afcc-137e-5c66-b231-4ffad7979b44", "bb28afcc-…
#> $ feedLanguage        <chr> "en-us", "en-us", "en-us"
#> $ feedDead            <int> 0, 0, 0
#> $ chaptersUrl         <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ transcriptUrl       <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ soundbite           <list> [915, 264, "Intro to DuckDB"], <NULL>, <NULL>
#> $ soundbites          <list> [[915, 264, "Intro to DuckDB"], [915, 264, "Intro …
#> $ persons             <list> [[38105478, "Mike Thomas", "host", "cast", "https:…
#> $ transcripts         <list> [["https://assets.podhome.fm/99cfd30f-e40c-426a-35…

The enclosureUrl field gives us the direct link to each podcast episode:

pull(episode_df, enclosureUrl)
#> [1] "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/63851340872787683054574074-0a1f-4951-8f2a-9084678c6604v1.mp3"
#> [2] "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/6385073280234160488c23a72a-d2f2-49e8-a460-df0535340d29v1.mp3"
#> [3] "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/6385013221315251258372ff1e-a595-4b18-a1cd-036a28153ccdv1.mp3"

As seen in the output above, we can create a wildcard version of the URL by replacing the MP3 file name at the end with a * character. Now we can run a revised call to op3_hits() using this URL as a parameter:

wildcard_url <- "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/*"

op3_hits(limit = 20, url = wildcard_url)
#> # A tibble: 20 × 15
#>    time                uuid      hashedIpAddress method url   userAgent edgeColo
#>    <dttm>              <chr>     <chr>           <chr>  <chr> <chr>     <chr>   
#>  1 2024-05-25 18:11:30 a11378c4… be723a50c9306d… GET    http… Podcasts… DFW     
#>  2 2024-05-25 18:00:33 b0992f89… 3288d89f797364… GET    http… AppleCor… YYZ     
#>  3 2024-05-25 17:59:33 d5e4eb49… 3288d89f797364… GET    http… AppleCor… YYZ     
#>  4 2024-05-25 17:58:04 ebbd6247… 3288d89f797364… GET    http… AppleCor… YYZ     
#>  5 2024-05-25 17:56:22 8c41efa0… 3288d89f797364… GET    http… AppleCor… YYZ     
#>  6 2024-05-25 17:56:22 d969ae1e… 3288d89f797364… GET    http… AppleCor… YYZ     
#>  7 2024-05-25 17:56:20 022bb28a… 3288d89f797364… GET    http… AppleCor… YYZ     
#>  8 2024-05-25 16:01:02 92326e56… d02c4cda211ac8… HEAD   http… iTMS      SJC     
#>  9 2024-05-25 14:38:35 5062341d… 0d48c6b69d4fe3… GET    http… CastBox/… FRA     
#> 10 2024-05-25 11:57:22 b4624c5a… b7bc8dca9ca126… GET    http… CastBox/… LHR     
#> 11 2024-05-25 11:57:21 1221118e… b7bc8dca9ca126… GET    http… CastBox/… LHR     
#> 12 2024-05-25 11:57:16 8e46ad9a… b7bc8dca9ca126… GET    http… CastBox/… LHR     
#> 13 2024-05-25 11:57:14 b0b6a72a… b7bc8dca9ca126… GET    http… CastBox/… LHR     
#> 14 2024-05-25 11:57:05 e2ff1880… b7bc8dca9ca126… GET    http… CastBox/… LHR     
#> 15 2024-05-25 11:57:04 62597cd8… b7bc8dca9ca126… GET    http… CastBox/… LHR     
#> 16 2024-05-25 09:28:29 f0c630db… 4759e0cfecd409… GET    http… AntennaP… CPH     
#> 17 2024-05-25 09:28:28 a9f8bc2c… 4759e0cfecd409… GET    http… AntennaP… CPH     
#> 18 2024-05-25 09:17:36 247467ca… aba6ac051add9a… HEAD   http… iTMS      SEA     
#> 19 2024-05-25 06:39:17 046c6786… 8d24f67d8898e3… GET    http… atc/1.0   MCI     
#> 20 2024-05-25 04:43:20 be539b09… 77f0f5aedc514b… HEAD   http… Apache-H… FRA     
#> # ℹ 8 more variables: continent <chr>, country <chr>, timezone <chr>,
#> #   regionCode <chr>, region <chr>, metroCode <chr>, range <chr>, xpsId <chr>