Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:< separator in LDIF not being parsed correctly #16

Open
twiz718 opened this issue Feb 4, 2013 · 1 comment
Open

:< separator in LDIF not being parsed correctly #16

twiz718 opened this issue Feb 4, 2013 · 1 comment

Comments

@twiz718
Copy link

twiz718 commented Feb 4, 2013

dn: MYDNHERE
sn: Khanin
givenName: Alex
whenCreated: 20080910232037.0Z
displayName: Khanin, Alex
department: MYDEPTHERE
sAMAccountName: myloginhere
mail: MYEMAILHERE
manager: MYMGRDNHERE
thumbnailPhoto:< file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY

This file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY exists and is readable (contains JPEG data).

If you try to run a LDAP::LDIF.parse_file() on this ldif you get the following error:

from script/rails:6:in `(root)'irb(main):004:0> LDAP::LDIF.parse_file("/var/tmp/akhanin.ldif")
ArgumentError: invalid byte sequence in UTF-8
from org/jruby/RubyRegexp.java:1487:in `=~'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:105:in `unsafe_char?'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:323:in `parse_entry'
from org/jruby/RubyArray.java:1613:in `each'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:184:in `parse_entry'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:481:in `parse_file'
from org/jruby/RubyIO.java:1183:in `open'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:439:in `parse_file'
from (irb):4:in `evaluate'
from org/jruby/RubyKernel.java:1066:in `eval'
from org/jruby/RubyKernel.java:1392:in `loop'
from org/jruby/RubyKernel.java:1174:in `catch'
from org/jruby/RubyKernel.java:1174:in `catch'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:47:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:8:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands.rb:41:in `(root)'
from org/jruby/RubyKernel.java:1027:in `require'
from script/rails:6:in `(root)'irb(main):005:0> 

When I run "file" on that thumbnailPhoto I get the following:
ldapsearch-thumbnailPhoto-S8oDGY: JPEG image data, JFIF standard 1.01

Now if I remove the last line in the ldif (with the thumbnail ":<" reference), it parses just fine.

@ghost
Copy link

ghost commented Feb 5, 2013

The problem is that ruby-ldap was not written to work with UTF-8, and method unsafe_char? fails when parsing a file

# return *true* if +str+ contains a character with an ASCII value > 127 or
# a NUL, LF or CR. Otherwise, *false* is returned.
#
def LDIF.unsafe_char?( str )
  # This could be written as a single regex, but this is faster.
  str =~ /^[ :]/ || str =~ /[\x00-\x1f\x7f-\xff]/
end

Wikipedia:

ASCII was incorporated into the Unicode character set as the first 128 symbols, so the ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with ASCII, a significant advantage.

so, sequence \x00-\x1f is correct and pass, but \x7f-\xff is invalid in UTF-8 and should be replaced to another one or even few sequences, but I do not know on which exactly

Patches are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant