Extract metadata from common life-science electron microscopy data in OSC-EM format.
- SerialEM
- Thermo Fisher EPU
- TOMO5
Binaries for Mac, Linux, and Windows can be downloaded from our releases page. Alternately, you can compile from source by running:
go build -o LS_metadata_reader .
The release executables for MacOS are not signed. You may get a warning that MacOS cannot verify the developer or check the binary for malicious software. If downloaded directly from Github this executable should be safe to run. You can bypass the warning by running the command:
xattr -d com.apple.quarantine LS_Metadata_reader
!!! Requires SerialEM 4.2.0 or newer !!!
SerialEM requires some additional configuration to ensure that all required information is available in the mdoc files.
- Add instrument properties to
SerialEMproperties.txt
. See the example. Update values to reflect your instrument parameters. - The two scripts are provided in
SerialEM_Scripts/
for SPA and Tomography datasets. One of these should be run after each image collection (the lowest tick mark on the SerialEM automization script selection). Otherwise SerialEM output will lack a few required fields for the schema.
Some instrument data is not available in EPU output. This is normally set in a configuration file, but can also be added at the command line using parameters.
A wizard is available to walk through creating the configuration file. Run it using
LS_Metadata_reader --c
The configuration file is saved in the following locations depending on your platform:
- Unix:
$XDG_CONFIG_HOME/LS_Metadata_reader/LS_reader.conf
(usually$HOME/.config/LS_Metadata_reader/LS_reader.conf
) - MacOS:
$HOME/Library/Application Support/LS_Metadata_reader/LS_reader.conf
- Windows:
%AppData%\LS_Metadata_reader\LS_reader.conf
Config values can also be set using the command line flags:
Config property | CLI Option | Required | Description |
---|---|---|---|
CS | --cs |
yes | the CS value of the instrument |
Gainref_FlipRotate | --gain_flip_rotate |
yes | the orientation of the gain_reference relative to actual data |
MPCPATH | --epu |
Path to EPU metadata directory |
EPU writes its metadata files in a different directory than its actual data (TOMO5 also
keeps some additional info that is processed by the LS_Metadata_reader there). It
generates another set of folders, usually on the microscope controlling computer, that
mirror its OffloadData folders in directory structure. Within them it stores some
related information, including the metadata xml files. If --epu
is defined as a flag
or in the config, the LS_Metadata_reader will directly grab those when the user points
it at a OffloadData directory.
NOTE: This requires you to mount the microscope computer directory for EPU on the
machine you are running LS_Metadata_reader on, as those are most likely NOT the same.
The extractor will work regardless if pointed to the xmls/mdocs directly, this is just
for convenience.
The reader should be called with the path to a folder containing the xml (EPU/TOMO5) or mdoc (SerialEM) files.
./LS_Metadata_reader -o tutorial_oscem.json tutorial/
For testing, try the associated tutorial folder; an example of how the output should look like is provided in the same folder (tutorial_correct.json). For first time use, disregard the warnings about config/flags those are for use directly with EPU or the OpenEM Ingestor.
The reader runs on a directory containing the microscope's additional information files for each micrograph (.mdoc or .xml for SerialEM and EPU, respectively). It generates a JSON file following the OSC-EM schema with metadata for the whole dataset. For usage with EPU, pointing to the top level directory is enough; it will search for the data folders and extract the info from there.
Using -z
you can also obtain a zip file of the xml files associated with your data
collection. This can be useful for archiving or for later analysis.
To include additional metadata not supported by the OSC-EM schema, use the -f
flag.
This will include all available dataset-level metadata.
This tool is a compatible metadata extractor for use with the SciCat Web Ingestor. It can be installed automatically by including the following in your ingestor configuration file:
MetadataExtractors:
- Name: LS
GithubOrg: SwissOpenEM
GithubProject: LS_Metadata_reader
Version: v0.3.0
Executable: LS_Metadata_reader
Checksum: 805fd036f2c83284b2cd70f2e7f3fafbe17bc750d2156f604c1505f7d5791d75
ChecksumAlg: sha256
CommandLineTemplate: "-i '{{.SourceFolder}}' -o '{{.OutputFile}}'"
Methods:
- Name: Single Particle
Schema: oscem_schemas.schema.json
- Name: Cellular Tomography
Schema: oscem_cellular_tomo.json
- Name: Tomography
Schema: oscem_tomo.json
- Name: EnvironmentalTomography
Schema: oscem_env_tomo.json
This will automatically download and install the LS_Metadata_extractor with the specified version.
Output is compatible to OSCEM schemas https://github.com/osc-em/OSCEM_Schemas/
Specific schema used to generate standard schema conform output (works for SPA and Tomography): https://github.com/osc-em/OSCEM_Schemas/blob/linkml_yaml/src/oscem_schemas/schema/oscem_schemas_tomo.yaml with LinkML gen-golang