-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Dominik Bönisch
authored
Jun 25, 2020
1 parent
f2c12ec
commit f6c32de
Showing
1 changed file
with
3 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
### Scraping data from an art museum collection | ||
|
||
The data is made available by the [National Gallery of Denmark (SMK)](https://www.smk.dk/en/article/smk-open/) via a so-called Application Programming Interface (API). According to the SMK meta-information of more than 70,000 digitised artworks are available. In fact, 79,002 object data can be accessed at the state of prototype development (March 2020). For the prototype only those data sets were scanned which contained an image file. For this purpose, all identification numbers of the works of art were read out via the standardized International Image Interoperability Framework (IIIF) interface and then checked for each ID link to see whether an associated thumbnail was available and loaded if necessary (44,626 images). In addition, a query was made as to whether the scraped work is in the public domain or whether it is copyrighted (32,411 images without copyright). This information is intended to help ensure that the image data is handled confidentially from the outset and that no copyrights are violated. | ||
The data is made available by the [National Gallery of Denmark (SMK)](https://www.smk.dk/en/article/smk-open/) via a so-called Application Programming Interface (API). According to the SMK meta-information of more than 70,000 digitised artworks are available. In fact, 79,002 object data can be accessed at the state of prototype development (March 2020). For the prototype only those data sets were scanned which contained an image file. | ||
|
||
For this purpose, all identification numbers of the works of art were read out via the standardized International Image Interoperability Framework (IIIF) interface and then checked for each ID link to see whether an associated thumbnail was available and loaded if necessary (44,626 images). In addition, a query was made as to whether the scraped work is in the public domain or whether it is copyrighted (32,411 images without copyright). This information is intended to help ensure that the image data is handled confidentially from the outset and that no copyrights are violated. | ||
|
||
The [notebook](https://github.com/DominikBoenisch/Training-the-Archive/blob/master/Prototype/1_Scraper/Scraper.ipynb) was made possible with the help of [Dr. rer. nat. Jan Sölter](https://de.linkedin.com/in/jansoelter). |