Thursday, February 17, 2022

Using API to search and download Parabaik illustrations from The Met


Looking for digital resources of white “parabaiks” I was excited to find that The Met offers API access to all its artwork in the public domain. I’d browsed through its Open Access artworks in The Online Collection by selecting the Open Access filter in search and searched for “parabaik”. The result was:

From there I was able to view all the images in the parabaik in the viewer, zoom them at will, and download them in the high resolution that they had been originally converted to digital images. However, the prospect of getting this or any of the high resolution images in The Metropolitan Museum of Arts for my digital collection through The Met’s API was too great to resist. One big advantage of the API access would be that I would be able to download the whole range of images of the parabaik, or a part of them at one time, instead of downloading them one by one from the viewer. In fact, the parabaik contains fifteen scenes, composed in a series of two, four and six page compositions, spanning a total of 66 pages.

For the job, I know a bit about the R software environment and I would try to use The Met’s API through the RStudio environment. Here, How to Access Any RESTful API Using the R Language by Andrew Carpenter gives me insight and inspiration to try to write an R application to access The Met’s parabaik collection. The steps that would be involved were:

  • Install the “httr” and “jsonlite” packages
  • Make a “GET” request to the API to pull raw data into your environment " Parse" that data from its raw form through JavaScript Object Notification ( JSON) into a usable format
  • Write a loop to “page” through that data and retrieve the full data set

As usual I had to go through a lot of trial and error, and again I was helped by the Q/A’s in the stackOverflow, as usual.

Information on the EndpointsObjectsRequest syntax, and Response items for The Met’s API were available here.


Search the information for parabaiks


library(httr)
library(jsonlite)
x <- GET("https://collectionapi.metmuseum.org/public/collection/v1/search?q=parabaik")
rawToChar(x$content)
[1] "{\"total\":1,\"objectIDs\":[744940]}"
xdata = fromJSON(rawToChar(x$content))
names(xdata)
[1] "total"     "objectIDs"


Get the raw data and convert them into usuable format

# Get objectIDs
xdata$objectIDs
[1] 744940
# Pull raw data
xy <- GET("https://collectionapi.metmuseum.org/public/collection/v1/objects/744940")
# Convert raw data in Json format to character data
PBKdata = fromJSON(rawToChar(xy$content))
# View the categories of meta data for the parabaik
names(PBKdata)
 [1] "objectID"              "isHighlight"           "accessionNumber"      
 [4] "accessionYear"         "isPublicDomain"        "primaryImage"         
 [7] "primaryImageSmall"     "additionalImages"      "constituents"         
[10] "department"            "objectName"            "title"                
[13] "culture"               "period"                "dynasty"              
[16] "reign"                 "portfolio"             "artistRole"           
[19] "artistPrefix"          "artistDisplayName"     "artistDisplayBio"     
[22] "artistSuffix"          "artistAlphaSort"       "artistNationality"    
[25] "artistBeginDate"       "artistEndDate"         "artistGender"         
[28] "artistWikidata_URL"    "artistULAN_URL"        "objectDate"           
[31] "objectBeginDate"       "objectEndDate"         "medium"               
[34] "dimensions"            "measurements"          "creditLine"           
[37] "geographyType"         "city"                  "state"                
[40] "county"                "country"               "region"               
[43] "subregion"             "locale"                "locus"                
[46] "excavation"            "river"                 "classification"       
[49] "rightsAndReproduction" "linkResource"          "metadataDate"         
[52] "repository"            "objectURL"             "tags"                 
[55] "objectWikidata_URL"    "isTimelineWork"        "GalleryNumber"        


Create the directory to download the images

p <- file.path("C:", "DATA", "parabaikMET", fsep = "/")
p
[1] "C:/DATA/parabaikMET"
# create a directory for it all
dir.create(p)


Download the images

For the purpose of the exercise, we choose the primaryImage and first two images from the additionalImages only for download.

# Get urls
urls <- c(PBKdata$primaryImage, PBKdata$additionalImages[1:2]) 
urls
[1] "https://images.metmuseum.org/CRDImages/as/original/DP-14374-040.jpg"
[2] "https://images.metmuseum.org/CRDImages/as/original/DP-14374-003.jpg"
[3] "https://images.metmuseum.org/CRDImages/as/original/DP-14374-004.jpg"
# iterate and download
lapply(urls, function(url) download.file(url, file.path(p, basename(url)),mode="wb"))
# Check the directory
list.files(p)
[1] "DP-14374-003.jpg" "DP-14374-004.jpg" "DP-14374-040.jpg"

Screenshot of the directory

No comments:

Post a Comment