Skip to content

Commit

Permalink
Merge pull request #82 from openpreserve/feat/spreadsheet-validation
Browse files Browse the repository at this point in the history
FEAT: Spreadsheet validation and dev docs
  • Loading branch information
carlwilson authored Nov 8, 2023
2 parents db5f339 + 48adcf9 commit 694107d
Show file tree
Hide file tree
Showing 14 changed files with 284 additions and 14 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ You may read more about the technical details of the validation checks [here](do

## Quick Start

For developer instructions with Maven locations and examples please see [DEVELOPER.md](docs/DEVELOPER.md).

### Prerequisites

To run the software you'll need a [Java 8](https://www.java.com/en/download/manual.jsp) JRE or newer.
Expand Down
155 changes: 155 additions & 0 deletions docs/DEVELOPER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# ODF Validation developer documentation

For developers wishing to integrate the ODF Validator into their own applications, the following information may be useful. You'll need to use the odf-core package which is currently in the OPF's Maven repository.

## Setting up the Maven repository

For now the Maven artefacts are hosted on the OPF's artifactory server. To use them you'll need to add the following to your Maven setting file (usually ~/.m2/settings.xml):

```xml
<settings xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd" xmlns="http://maven.apache.org/SETTINGS/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<profiles>
<profile>
<repositories>
<repository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>central</id>
<name>opf-dev</name>
<url>https://artifactory.openpreservation.org/artifactory/opf-dev</url>
</repository>
<repository>
<snapshots />
<id>snapshots</id>
<name>opf-dev</name>
<url>https://artifactory.openpreservation.org/artifactory/opf-dev</url>
</repository>
</repositories>
<id>artifactory</id>
</profile>
</profiles>
<activeProfiles>
<activeProfile>artifactory</activeProfile>
</activeProfiles>
</settings>
```

## Including the core validation library

To include the core validation library in your project, add the following dependency to your pom.xml:

```xml
<dependency>
<groupId>org.openpreservation.odf</groupId>
<artifactId>odf-core</artifactId>
<version>0.9.0</version>
</dependency>
```

## Parsing an ODF package

The library allows a non-validating parse of an ODF package, indeed this is a pre-requisite to valdiation which is performed against a package instance. The following code snippet shows how to parse an ODF package:

```java
import org.openpreservation.odf.pkg.FileEntry;
import org.openpreservation.odf.pkg.Manifest;
import org.openpreservation.odf.pkg.OdfPackage;
import org.openpreservation.odf.pkg.OdfPackages;
import org.openpreservation.odf.pkg.PackageParser;

// Get a package parser instance
PackageParser packageParser = OdfPackages.getPackageParser();

File packageFile = new File("path/to/package.ods");
OdfPackage odfPackage = packageParser.parsePackage(packageFile);

// Get the package manifest
Manifest manifest = odfPackage.getManifest();

// Get the file entries from the manifest
for (FileEntry entry : manifest.getEntries()) {
// Get the entry declared MIME type
String mediaType = entry.getMediaType();
// Get the entry declared full path
String fullPath = entry.getFullPath();
// Get the entry Input Stream
try (InputStream is = odfPackage.getEntryStream(entry)) {
// Do something with the entry
}
}
```

## Validating an ODF package

```java
import org.openpreservation.messages.Message;
import org.openpreservation.odf.pkg.OdfPackage;
import org.openpreservation.odf.validation.ValidatingParser;
import org.openpreservation.odf.validation.ValidationReport;
import org.openpreservation.odf.validation.Validators;

ValidatingParser packageParser = Validators.getValidatingParser();

File packageFile = new File("path/to/package.ods");

// Get the OdfPackage instance from the parser
OdfPackage odfPackage = packageParser.parsePackage(packageFile.toPath());

// Now validate the package and get the validation report
ValidationReport report = packageParser.validatePackage(odfPackage);

// Is the package valid?
if (report.isValid()) {
System.out.println("Package is valid");
// Get any warnings or info message (no errors as the package is valid)
List<Message> messages = report.getMessages();
// Loop through the messages
for (Message message : messages) {
// Get the message id
System.out.println(message.getId());
// Get the message severity (INFO, WARNING, ERROR)
System.out.println(message.getSeverity());
// Print out the message text
System.out.println(message.getMessage());
}
} else {
System.out.println("Package is not valid");
// Get the error messages
List<Message> messages = report.getErrors();
for (Message message : messages) {
// Get the message id
System.out.println(message.getId());
// Print out the message text
System.out.println(message.getMessage());
}
}
```

## Validation of Spreadsheets Only

The ODF Validator can be used to validate spreadsheets only. This is useful if you want to validate a spreadsheet without having to parse the entire package. The following code snippet shows how to validate a spreadsheet:

```java
import org.openpreservation.messages.Message;
import org.openpreservation.odf.validation.ValidationReport;
import org.openpreservation.odf.validation.Validator;

Validator validator = new Validator();
ValidationReport report = validator.validateSpreadsheet(new File("path/to/package.ods"));
if (!report.isValid()) {
List<Message> messages = report.getMessages();
// Loop through the messages
for (Message message : messages) {
// Get the message id
System.out.println(message.getId());
// Get the message severity (INFO, WARNING, ERROR)
System.out.println(message.getSeverity());
// Print out the message text
System.out.println(message.getMessage());
}
} else {
System.out.println("The document is valid");
}
```
10 changes: 2 additions & 8 deletions docs/VALIDATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ All files contained in the Zip file shall be non compressed (STORED) or compress

### PKG-3

An OpenDocument package SHALL only contain the "META-INF/manifest.xml" and files containg the term "signatures" in their name in the "META-INF" folder. File %s does not meet this criteria.

It (an OpenDocument package) may contain files whose relative paths begin with “META-INF/” and whose names contain the string “signatures”. These file shall meet the following requirements:

* D.1: The files shall be well-formed XML files in accordance with [XML1.0].
Expand All @@ -33,14 +35,6 @@ It (an OpenDocument package) may contain files whose relative paths begin with

* D.3: The files shall be valid with respect to the digital signature schema defined in appendix A.2 OpenDocument Digital Signature Schema.

TODO: This needs expanding to cover digital signature file validation. It appears that sub-directories are valid if they contain digital signature files.

Should the presence of empty directories below META-INF be considered an error?

ALL files that don't contain the string "signatures" should be a validation error.

Any files that contain the string "signatures" should be checked against D1-D3. This is implemented but the reporting logic needs to be improved.

### PKG-9 (Error)

An OpenDocument package SHALL be a well formed Zip Archive.
Expand Down
2 changes: 1 addition & 1 deletion odf-apps/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>org.openpreservation.odf</groupId>
<artifactId>odf-validator</artifactId>
<version>0.1.0-SNAPSHOT</version>
<version>0.9.0</version>
</parent>

<artifactId>odf-apps</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion odf-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>org.openpreservation.odf</groupId>
<artifactId>odf-validator</artifactId>
<version>0.1.0-SNAPSHOT</version>
<version>0.9.0</version>
</parent>

<groupId>org.openpreservation.odf</groupId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
package org.openpreservation.odf.document;

import java.util.Objects;

import org.openpreservation.format.xml.ParseResult;
import org.openpreservation.odf.pkg.OdfPackage;
import org.openpreservation.odf.xml.Metadata;
import org.openpreservation.odf.xml.OdfXmlDocument;
import org.openpreservation.odf.xml.OdfXmlDocuments;

public class Documents {
private Documents() {
throw new AssertionError("Utility class 'Documents' should not be instantiated");
}

public static final OpenDocument openDocumentOf(OdfDocument document) {
Objects.requireNonNull(document, "OdfDocument parameter document cannot be null");
return OpenDocumentImpl.of(document);
}

public static final OpenDocument openDocumentOf(OdfPackage pkg) {
Objects.requireNonNull(pkg, "OdfPackage pkg document cannot be null");
return OpenDocumentImpl.of(pkg);
}

public static final OdfDocument odfDocumentOf(final OdfXmlDocument xmlDocument, final Metadata metadata) {
Objects.requireNonNull(xmlDocument, "OdfXmlDocument parameter xmlDocument cannot be null");
Objects.requireNonNull(metadata, "Metadata parameter metadata cannot be null");
return OdfDocumentImpl.of(xmlDocument, metadata);
}

public static final OdfDocument odfDocumentOf(final ParseResult parseResult, final Metadata metadata) {
Objects.requireNonNull(parseResult, "ParseResult parameter parseResult cannot be null");
Objects.requireNonNull(metadata, "Metadata parameter metadata cannot be null");
return OdfDocumentImpl.of(OdfXmlDocuments.odfXmlDocumentOf(parseResult), metadata);
}

public static final OdfDocument odfDocumentOf(final ParseResult parseResult) {
Objects.requireNonNull(parseResult, "ParseResult parameter parseResult cannot be null");
return OdfDocumentImpl.of(parseResult);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,18 @@ static final OdfDocument of(final OdfXmlDocument xmlDocument, final Metadata met
Objects.requireNonNull(metadata, "Metadata parameter metadata cannot be null");
return new OdfDocumentImpl(xmlDocument, metadata);
}

static final OdfDocument of(final ParseResult parseResult, final Metadata metadata) {
Objects.requireNonNull(parseResult, "ParseResult parameter parseResult cannot be null");
Objects.requireNonNull(metadata, "Metadata parameter metadata cannot be null");
return new OdfDocumentImpl(OdfXmlDocuments.odfXmlDocumentOf(parseResult), metadata);
}

static final OdfDocument of(final ParseResult parseResult) {
Objects.requireNonNull(parseResult, "ParseResult parameter parseResult cannot be null");
return new OdfDocumentImpl(OdfXmlDocuments.odfXmlDocumentOf(parseResult));
}

static final OdfDocument from(final InputStream docStream)
throws IOException, ParserConfigurationException, SAXException {
Objects.requireNonNull(docStream, "InputStream parameter docStream cannot be null");
Expand All @@ -50,6 +56,10 @@ static final OdfDocument from(final InputStream docStream)
private final OdfXmlDocument xmlDocument;
private final Metadata metadata;

private OdfDocumentImpl(final OdfXmlDocument xmlDocument) {
this(xmlDocument, null);
}

private OdfDocumentImpl(final OdfXmlDocument xmlDocument, final Metadata metadata) {
super();
this.xmlDocument = xmlDocument;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import java.util.Collection;

import org.openpreservation.odf.fmt.Formats;
import org.openpreservation.odf.pkg.OdfPackage;

public interface OpenDocument {
Expand Down Expand Up @@ -43,4 +44,6 @@ public interface OpenDocument {
* @return the ODF Package for the OpenDocument
*/
public OdfPackage getPackage();

public Formats getFormat();
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import java.util.List;
import java.util.Objects;

import org.openpreservation.odf.fmt.Formats;
import org.openpreservation.odf.pkg.OdfPackage;
import org.openpreservation.odf.pkg.OdfPackageDocument;

Expand Down Expand Up @@ -59,6 +60,11 @@ public OdfPackage getPackage() {
return this.pkg;
}

@Override
public Formats getFormat() {
return (this.isPackage()) ? this.pkg.getDetectedFormat() : Formats.fromMime(this.document.getXmlDocument().getMimeType());
}

@Override
public int hashCode() {
return Objects.hash(document, pkg);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import org.openpreservation.messages.Message;
import org.openpreservation.messages.MessageFactory;
import org.openpreservation.messages.Messages;
import org.openpreservation.odf.document.Documents;
import org.openpreservation.odf.fmt.OdfFormats;
import org.openpreservation.odf.pkg.FileEntry;
import org.openpreservation.odf.pkg.Manifest;
Expand Down Expand Up @@ -84,7 +85,7 @@ public OdfPackage parsePackage(final InputStream toParse, final String name) thr
}

private ValidationReport validate(final OdfPackage odfPackage) {
final ValidationReport report = ValidationReport.of(odfPackage.getName());
final ValidationReport report = ValidationReport.of(odfPackage.getName(), Documents.openDocumentOf(odfPackage));
report.add(OdfFormats.MIMETYPE, checkMimeEntry(odfPackage));
if (!odfPackage.hasManifest()) {
report.add(OdfPackages.PATH_MANIFEST, FACTORY.getError("PKG-4"));
Expand Down
Loading

0 comments on commit 694107d

Please sign in to comment.