Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Dominik Bönisch authored Jun 25, 2020
1 parent f2c12ec commit f6c32de
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion Prototype/1_Scraper/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
### Scraping data from an art museum collection

The data is made available by the [National Gallery of Denmark (SMK)](https://www.smk.dk/en/article/smk-open/) via a so-called Application Programming Interface (API). According to the SMK meta-information of more than 70,000 digitised artworks are available. In fact, 79,002 object data can be accessed at the state of prototype development (March 2020). For the prototype only those data sets were scanned which contained an image file. For this purpose, all identification numbers of the works of art were read out via the standardized International Image Interoperability Framework (IIIF) interface and then checked for each ID link to see whether an associated thumbnail was available and loaded if necessary (44,626 images). In addition, a query was made as to whether the scraped work is in the public domain or whether it is copyrighted (32,411 images without copyright). This information is intended to help ensure that the image data is handled confidentially from the outset and that no copyrights are violated.
The data is made available by the [National Gallery of Denmark (SMK)](https://www.smk.dk/en/article/smk-open/) via a so-called Application Programming Interface (API). According to the SMK meta-information of more than 70,000 digitised artworks are available. In fact, 79,002 object data can be accessed at the state of prototype development (March 2020). For the prototype only those data sets were scanned which contained an image file.

For this purpose, all identification numbers of the works of art were read out via the standardized International Image Interoperability Framework (IIIF) interface and then checked for each ID link to see whether an associated thumbnail was available and loaded if necessary (44,626 images). In addition, a query was made as to whether the scraped work is in the public domain or whether it is copyrighted (32,411 images without copyright). This information is intended to help ensure that the image data is handled confidentially from the outset and that no copyrights are violated.

The [notebook](https://github.com/DominikBoenisch/Training-the-Archive/blob/master/Prototype/1_Scraper/Scraper.ipynb) was made possible with the help of [Dr. rer. nat. Jan Sölter](https://de.linkedin.com/in/jansoelter).

0 comments on commit f6c32de

Please sign in to comment.