op3r
op3r.Rmd
The op3r package was created to bring the novel Open Podcast Prefix Project (OP3) API to R users through an streamlined and straight-forward interface. A growing number of podcasts adopting the Podcastion 2.0 namespace are opting in to the OP3 service as a robust and transparent source of statistics associated with their podcast.
Initial setup
The API endpoints supported by op3r require an API
token. To obtain your own set, create a free developer account at https://op3.dev/api/docs
and save the token as an environment variables called
OP3_API_TOKEN
within a project-level or default
user-directory .Renviron
file. Below is an example (change
to match your token value):
OP3_API_TOKEN="abcd123"
Once the package is installed, you can verify if authentication is working as expected:
Case Study
The API endpoints offered by the OP3 service offer the following general capabilities:
- Assembling metrics pooled across all podcasts utilizing the OP3 service.
- Metrics associated with a specific podcast using the OP3 service.
The examples below demonstrate each of these perspectives. The podcast used for the single-podcast examples is the R Weekly Highlights Podcast hosted by Eric Nantz and Michael Thomas. First we load additional R packages used for analysis and visualization:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Podcast metadata
Each podcast using the OP3 project is assigned a universally unique identifier (UUID). Many of the functions tailored to specific podcasts rely on this identifier as a parameter. To find this identifier for a given podcast, you can use either of the following methods:
- Use the
op3_show()
function and supply either the podcast’s GUID or RSS feed URL for the requiredshow_id
parameter:
rweekly_show_rss <- "https://serve.podhome.fm/rss/bb28afcc-137e-5c66-b231-4ffad7979b44"
op3_show(show_id = rweekly_show_rss)
#> # A tibble: 1 × 4
#> showUuid title podcastGuid statsPageUrl
#> <chr> <chr> <chr> <chr>
#> 1 c008c9c7cfe847dda55cfdde54a22154 R Weekly Highlights bb28afcc-13… https://op3…
- Use the
op3_get_show_uuid()
function with either the podcast GUID or RSS feed URL forshow_id
:
op3_get_show_uuid("https://serve.podhome.fm/rss/bb28afcc-137e-5c66-b231-4ffad7979b44")
#> [1] "c008c9c7cfe847dda55cfdde54a22154"
op3_get_show_uuid("bb28afcc-137e-5c66-b231-4ffad7979b44")
#> [1] "c008c9c7cfe847dda55cfdde54a22154"
Download Metrics
Within the podcast industry, many hosts and hosting companies
gravitate towards download metrics. The
op3_downloads_show()
function obtains download statistics
for the previous 30 days of the time the function is executed (for
additional details on how OP3 calculates downloads refer to the official
documentation):
# using the OP3 UUID for R Weekly Highlights
show_id <- "c008c9c7cfe847dda55cfdde54a22154"
op3_downloads_show(show_id = show_id)
#> # A tibble: 1 × 5
#> daysBotDownloads monthlyDownloads weeklyAvgDownloads numWeeks download_data
#> <dbl> <int> <int> <int> <list>
#> 1 30 1972 470 4 <tibble [4 × 2]>
If you are interested in the downloads per week captured in the
metrics, you can run the same function with
nest_downloads = FALSE
:
op3_downloads_show(show_id = show_id, nest_downloads = FALSE) |>
select(weekNumber, weeklyDownloads)
#> # A tibble: 4 × 2
#> weekNumber weeklyDownloads
#> <int> <int>
#> 1 1 620
#> 2 2 619
#> 3 3 499
#> 4 4 143
Additional download metrics are available based on applications used
to play the podcast. The op3_top_show_apps()
function
obtains metrics over the last three calendar months from the time of
execution:
show_guid <- "bb28afcc-137e-5c66-b231-4ffad7979b44"
op3_top_show_apps(show_guid)
#> # A tibble: 33 × 3
#> app_name value show_uuid
#> <chr> <int> <chr>
#> 1 Apple Podcasts 778 c008c9c7cfe847dda55cfdde54a22154
#> 2 Overcast 550 c008c9c7cfe847dda55cfdde54a22154
#> 3 Pocket Casts 350 c008c9c7cfe847dda55cfdde54a22154
#> 4 gPodder 339 c008c9c7cfe847dda55cfdde54a22154
#> 5 Podcast Addict 265 c008c9c7cfe847dda55cfdde54a22154
#> 6 YouTube Music 188 c008c9c7cfe847dda55cfdde54a22154
#> 7 Google Podcasts 158 c008c9c7cfe847dda55cfdde54a22154
#> 8 AntennaPod 139 c008c9c7cfe847dda55cfdde54a22154
#> 9 Snipd 93 c008c9c7cfe847dda55cfdde54a22154
#> 10 Unknown Apple App 89 c008c9c7cfe847dda55cfdde54a22154
#> # ℹ 23 more rows
Redirect Logs / Hits
Each of the functions demonstrated above are convenient ways to obtain derived metrics already pre-computed behind the scenes in the corresponding OP3 API endpoints. The common source of information used in each of them are redirect logs (hits), a collection of metadata dynamically generated each time a client such as a podcast application or web browser renders the URL for a podcast episode. To illustrate, here is the most recent log metadata at the time this vignette was compiled:
op3_hits(limit = 1) |>
glimpse()
#> Rows: 1
#> Columns: 14
#> $ time <dttm> 2024-05-25 18:25:01
#> $ uuid <chr> "2525a8c995d14e97b9ff4fb219ef0e2c"
#> $ hashedIpAddress <chr> "9e94063e25414d195e4ebf99aaedf985198b95b5"
#> $ method <chr> "GET"
#> $ url <chr> "https://op3.dev/e/traffic.megaphone.fm/ADV1129270641…
#> $ userAgent <chr> "Spotify/8.9.42.575 Android/33 (LM-G900)"
#> $ range <chr> "bytes=0-3145727"
#> $ edgeColo <chr> "ORD"
#> $ continent <chr> "NA"
#> $ country <chr> "US"
#> $ timezone <chr> "America/New_York"
#> $ regionCode <chr> "KY"
#> $ region <chr> "Kentucky"
#> $ metroCode <chr> "541"
On top of these logs being available across all shows, we can filter
the logs returned for a particular podcast by setting the
url
parameter to be either a direct link to a podcast
episode, or a version of the URL with a wildcard placeholder
*
to signal multiple episodes.
How can we obtain these episode download URLs? You certainly could
obtain the podcast’s RSS feed and manually inspect each episode entry to
find the direct link to an episode. However, another R package called {podindexr}
contains many functions to obtain metadata for podcasts registered on
the Podcast Index. For example,
the episodes_byurl()
function takes a podcast RSS feed as
input to obtain a collection of metadata associated with up to 1,000
episodes of a podcast. Here is the metadata associated with the last
three episodes of the R Weekly Highlights podcast:
# remotes::install_github("rpodcast/podindexr")
library(podindexr)
episode_df <- episodes_byurl(rweekly_show_rss, max = 3)
glimpse(episode_df)
#> Rows: 3
#> Columns: 30
#> $ id <dbl> 22601290393, 22315244014, 22028048965
#> $ title <chr> "Issue 2024-W20 Highlights", "Issue 2024-W19 Highl…
#> $ link <chr> "https://serve.podhome.fm/episodepage/r-weekly-hig…
#> $ description <chr> "An aesthetically-pleasing journey through the his…
#> $ guid <chr> "54574074-0a1f-4951-8f2a-9084678c6604", "8c23a72a-…
#> $ datePublished <int> 1715756520, 1715152260, 1714548120
#> $ datePublishedPretty <chr> "May 15, 2024 2:02am", "May 08, 2024 2:11am", "May…
#> $ dateCrawled <int> 1715756613, 1715756613, 1715756613
#> $ enclosureUrl <chr> "https://op3.dev/e/serve.podhome.fm/episode/99cfd3…
#> $ enclosureType <chr> "audio/mpeg", "audio/mpeg", "audio/mpeg"
#> $ enclosureLength <int> 71545543, 71466423, 53758290
#> $ duration <int> 2955, 2946, 2215
#> $ explicit <int> 0, 0, 0
#> $ episode <int> 165, 164, 163
#> $ episodeType <chr> "full", "full", "full"
#> $ season <int> 0, 0, 0
#> $ image <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ feedItunesId <int> 1527876975, 1527876975, 1527876975
#> $ feedUrl <chr> "https://serve.podhome.fm/rss/bb28afcc-137e-5c66-b…
#> $ feedImage <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ feedId <int> 1062040, 1062040, 1062040
#> $ podcastGuid <chr> "bb28afcc-137e-5c66-b231-4ffad7979b44", "bb28afcc-…
#> $ feedLanguage <chr> "en-us", "en-us", "en-us"
#> $ feedDead <int> 0, 0, 0
#> $ chaptersUrl <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ transcriptUrl <chr> "https://assets.podhome.fm/99cfd30f-e40c-426a-3557…
#> $ soundbite <list> [915, 264, "Intro to DuckDB"], <NULL>, <NULL>
#> $ soundbites <list> [[915, 264, "Intro to DuckDB"], [915, 264, "Intro …
#> $ persons <list> [[38105478, "Mike Thomas", "host", "cast", "https:…
#> $ transcripts <list> [["https://assets.podhome.fm/99cfd30f-e40c-426a-35…
The enclosureUrl
field gives us the direct link to each
podcast episode:
pull(episode_df, enclosureUrl)
#> [1] "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/63851340872787683054574074-0a1f-4951-8f2a-9084678c6604v1.mp3"
#> [2] "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/6385073280234160488c23a72a-d2f2-49e8-a460-df0535340d29v1.mp3"
#> [3] "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/6385013221315251258372ff1e-a595-4b18-a1cd-036a28153ccdv1.mp3"
As seen in the output above, we can create a wildcard version of the
URL by replacing the MP3 file name at the end with a *
character. Now we can run a revised call to op3_hits()
using this URL as a parameter:
wildcard_url <- "https://op3.dev/e/serve.podhome.fm/episode/99cfd30f-e40c-426a-3557-08dc4ea63bb0/*"
op3_hits(limit = 20, url = wildcard_url)
#> # A tibble: 20 × 15
#> time uuid hashedIpAddress method url userAgent edgeColo
#> <dttm> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2024-05-25 18:11:30 a11378c4… be723a50c9306d… GET http… Podcasts… DFW
#> 2 2024-05-25 18:00:33 b0992f89… 3288d89f797364… GET http… AppleCor… YYZ
#> 3 2024-05-25 17:59:33 d5e4eb49… 3288d89f797364… GET http… AppleCor… YYZ
#> 4 2024-05-25 17:58:04 ebbd6247… 3288d89f797364… GET http… AppleCor… YYZ
#> 5 2024-05-25 17:56:22 8c41efa0… 3288d89f797364… GET http… AppleCor… YYZ
#> 6 2024-05-25 17:56:22 d969ae1e… 3288d89f797364… GET http… AppleCor… YYZ
#> 7 2024-05-25 17:56:20 022bb28a… 3288d89f797364… GET http… AppleCor… YYZ
#> 8 2024-05-25 16:01:02 92326e56… d02c4cda211ac8… HEAD http… iTMS SJC
#> 9 2024-05-25 14:38:35 5062341d… 0d48c6b69d4fe3… GET http… CastBox/… FRA
#> 10 2024-05-25 11:57:22 b4624c5a… b7bc8dca9ca126… GET http… CastBox/… LHR
#> 11 2024-05-25 11:57:21 1221118e… b7bc8dca9ca126… GET http… CastBox/… LHR
#> 12 2024-05-25 11:57:16 8e46ad9a… b7bc8dca9ca126… GET http… CastBox/… LHR
#> 13 2024-05-25 11:57:14 b0b6a72a… b7bc8dca9ca126… GET http… CastBox/… LHR
#> 14 2024-05-25 11:57:05 e2ff1880… b7bc8dca9ca126… GET http… CastBox/… LHR
#> 15 2024-05-25 11:57:04 62597cd8… b7bc8dca9ca126… GET http… CastBox/… LHR
#> 16 2024-05-25 09:28:29 f0c630db… 4759e0cfecd409… GET http… AntennaP… CPH
#> 17 2024-05-25 09:28:28 a9f8bc2c… 4759e0cfecd409… GET http… AntennaP… CPH
#> 18 2024-05-25 09:17:36 247467ca… aba6ac051add9a… HEAD http… iTMS SEA
#> 19 2024-05-25 06:39:17 046c6786… 8d24f67d8898e3… GET http… atc/1.0 MCI
#> 20 2024-05-25 04:43:20 be539b09… 77f0f5aedc514b… HEAD http… Apache-H… FRA
#> # ℹ 8 more variables: continent <chr>, country <chr>, timezone <chr>,
#> # regionCode <chr>, region <chr>, metroCode <chr>, range <chr>, xpsId <chr>