Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the function to embed images ICC profile #1437

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

xh4010
Copy link

@xh4010 xh4010 commented Mar 16, 2023

By adding two asynchronous methods, imageWithICC and embedWithICC, it is now possible to embed the ICC configuration file into a PDF file if the JPG image contains one.

  • Unit Tests
  • Documentation
  • Update CHANGELOG.md
  • Ready to be merged

@blikblum
Copy link
Member

Some considerations:

  • In EXIF Orientation Support #1379 a dependency to jpeg-exif is added. We should add only one exit package
  • Adding an async method deviates from the current API. We should try to keep with the current approach (BTW: given that the image is already loaded in memory is possible to read exif info synchronously )
  • See lint issues

@fstrube
Copy link

fstrube commented Mar 17, 2023

You can try using the jpeg-exif package. It has a sync method for parsing EXIF data.

@xh4010
Copy link
Author

xh4010 commented Mar 17, 2023

Hi @blikblum @fstrube

  • When embedding an ICC profile in a PDF, it's necessary to extract the raw data of the ICC profile and write it to the PDF stream. However, jpeg-exif doesn't support this operation and can only return concise exif information. To address this, exifr provides a more advanced API.

  • pdfkit API is synchronous while exifr only offers asynchronous methods. If you refer to the Exifr source code to modify this feature into a synchronous method, it will take a long time. To improve usability, I have added asynchronous methods. For most users, whether or not to embed the ICC is not an issue. However, our PDFs will be used for printing, and therefore consistent colors must be maintained. (See the result of embedding ICC profile.)

  • fixed the lint issues

@fstrube
Copy link

fstrube commented Mar 19, 2023

@xh4010 I see where jpeg-exif falls short, and it looks like exifr doesn't even do what we really want, as you're having to reconstruct the ICC profile from the app segments. I propose we parse out the ICC profile from the app segments ourselves so we can prevent introducing async functions.

I was able to get this working by modifying the jpeg.js source very similarly to your approach. The snippet below demonstrates that it is possible with minimal code and no further dependencies. Feel free to take this code and modify or improve it.

@blikblum I'm curious what your thoughts are on this approach.

const MARKERS = [0xffc0, 0xffc1, 0xffc2, 0xffc3, 0xffc5, 0xffc6, 0xffc7, 0xffc8, 0xffc9, 0xffca, 0xffcb, 0xffcc, 0xffcd, 0xffce, 0xffcf];
const COLOR_SPACE_MAP = {
  1: 'DeviceGray',
  3: 'DeviceRGB',
  4: 'DeviceCMYK'
};

class JPEG {
  constructor(data, label) {
    let marker;
    this.data = data;
    this.label = label;

    if (this.data.readUInt16BE(0) !== 0xffd8) {
      throw 'SOI not found in JPEG';
    }

    let pos = 2;

    while (pos < this.data.length) {
      marker = this.data.readUInt16BE(pos);
      pos += 2;

      if (MARKERS.includes(marker)) {
        break;
      }

      pos += this.data.readUInt16BE(pos);
    }

    if (!MARKERS.includes(marker)) {
      throw 'Invalid JPEG.';
    }

    pos += 2;
    this.bits = this.data[pos++];
    this.height = this.data.readUInt16BE(pos);
    pos += 2;
    this.width = this.data.readUInt16BE(pos);
    pos += 2;
    // const channels = this.data[pos++];
    this.channels = this.data[pos++];
    this.colorSpace = COLOR_SPACE_MAP[this.channels];
    this.obj = null;

    // Parse an embedded ICC profile
    pos = 2;
    while (pos < this.data.length) {
      marker = this.data.readUInt16BE(pos);
      pos += 2;

      if (marker === 0xffe2) {
        break;
      }

      pos += this.data.readUInt16BE(pos);
    }

    if (marker !== 0xffe2) {
      console.log('no ICC profile found')
    } else {
      let length = this.data.readUInt16BE(pos);
      pos += 2;
      // Hardcoded header length of 14 bytes, not entirely sure why
      this.iccProfile = this.data.slice(pos, pos + length).slice(14);
    }
  }

  embed(document) {
    if (this.obj) {
      return;
    }

    let colorSpace = this.colorSpace;

    if (this.iccProfile) {
      let profile = document.ref({
        Alternate: colorSpace,
        N: this.channels,
        Length: this.iccProfile.length,
      });
      profile.end(this.iccProfile);

      colorSpace = document.ref([`ICCBased ${profile}`])
      colorSpace.end();
    }

    this.obj = document.ref({
      Type: 'XObject',
      Subtype: 'Image',
      BitsPerComponent: this.bits,
      Width: this.width,
      Height: this.height,
      ColorSpace: colorSpace,
      Filter: 'DCTDecode'
    }); // add extra decode params for CMYK images. By swapping the
    // min and max values from the default, we invert the colors. See
    // section 4.8.4 of the spec.

    if (this.colorSpace === 'DeviceCMYK') {
      this.obj.data['Decode'] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0];
    }

    this.obj.end(this.data); // free memory

    return this.data = null;
  }

}

@blikblum
Copy link
Member

Lot better.

Only issue is testing if parsing is correct. But one or two integration tests are ok

@xh4010
Copy link
Author

xh4010 commented Mar 20, 2023

@fstrube You're right. Once we know the location and structure of the ICC profile, we can parse it ourselves without relying on other libraries. We have more work to do because the ICC profile may exist in multiple segments. Additionally, we need to verify that the extracted data is correct. Otherwise, writing it directly could result in an erroneous PDF.

I will find some time to make the modifications. Thank you.

@xh4010
Copy link
Author

xh4010 commented Mar 20, 2023

Hi @blikblum @fstrube

I removed the exifr package and created a function extractICCProfile in the jpeg .js file to extract and parse the ICC Profile, added an option parameter embedICCProfile (default: true), and the test was correct.

let doc = new PDFDocument(); 
doc.pipe(fs.createWriteStream('test.pdf'));
doc.image('tests/images/landscape+icc.jpg', 40, 70, {
  width: 200,
  height: 267,
  embedICCProfile: true
});
doc.end();

@fstrube
Copy link

fstrube commented Mar 20, 2023

@xh4010 I like this a lot better!

My initial thoughts are that you should abstract the ICC parsing into a separate ICCProfile class. Maybe structure it something like the snippet below. Can you reintroduce the support for PNG's with embedded profiles?

class ICCProfile {
    static extractFromJPEG(jpeg) {
        // ...
    }

    static extractFromPNG(png) {
        // ...
    }

    constructor(buffer) {
        this.buffer = buffer
        this.data = this.parse(buffer)
    }

    parse(buffer) {
        // ...
    }

    embed(doc) {
        // ...
    }
}

Abstract the ICC parsing into a separate ICCProfile class
@xh4010
Copy link
Author

xh4010 commented Mar 21, 2023

@fstrube
I have made modifications according to your suggestions.
Support JPEG/PNG to embed ICC Profile if exists into PDF

@nickgraz
Copy link

@xh4010 great work here! Is there anything I can help with to get this PR merged in?

static extractFromJPEG(jpeg) {
let pos = 2;
const buffers = [];
while (pos < jpeg.length) {
Copy link

@jpohlmann-artifact jpohlmann-artifact Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this gives me an error, because const marker = jpeg.readUInt16BE(pos); only takes a max offset of length - 2 https://www.geeksforgeeks.org/node-js-buffer-readuint16be-method/.

If I change this to while (pos < jpeg.length - 4) {, it finishes sucessfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants