Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ #442

Open
uniqueSkeeter opened this issue Dec 25, 2024 · 14 comments

Comments

@uniqueSkeeter
Copy link

2024-12-25 17:48:45.815 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class
2024-12-25 17:49:33.237 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class
Recognizing layout: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.11s/it]
Detecting bboxes: 0it [00:00, ?it/s]
Recognizing equations: 0it [00:00, ?it/s]
Recognizing tables: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.83it/s]
2024-12-25 17:49:55.155 Uncaught app execution
Traceback (most recent call last):
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
result = func()
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec
exec(code, module.dict)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/marker_app.py", line 136, in
rendered = convert_pdf(
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/marker_app.py", line 37, in convert_pdf
return converter(fname)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/marker/converters/pdf.py", line 109, in call
processor(document)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/marker/processors/debug.py", line 60, in call
self.draw_layout_debug_images(document)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/marker/processors/debug.py", line 109, in draw_layout_debug_images
self.render_on_image(line_bboxes, png_image, labels=line_text, color="black", draw_bbox=False, label_font_size=24)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/marker/processors/debug.py", line 173, in render_on_image
label_font = ImageFont.truetype(font_path, label_font_size)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/PIL/ImageFont.py", line 834, in truetype
return freetype(font)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/PIL/ImageFont.py", line 831, in freetype
return FreeTypeFont(font, size, index, encoding, layout_engine)
File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/PIL/ImageFont.py", line 257, in init
self.font = core.getfont(
OSError: locations (loca) table missing

@10179013
Copy link

I also encountered this problem

@flight505
Copy link

I am getting this as well, I am not sure were it is coming from. I have marker in a streamlit app with LightRAG, I noticed someone else mentioning the issue on the LightRAG git

@Godplayer
Copy link

I am also getting the same error

1 similar comment
@Bob080812
Copy link

I am also getting the same error

@Mehdi-GASMI
Copy link

same for me

@Franky5831
Copy link

Same here

@paulo-maia
Copy link

ditto

@franklinthony
Copy link

Same for me

@ahming
Copy link

ahming commented Jan 8, 2025

same for me

@VikParuchuri
Copy link
Owner

VikParuchuri commented Jan 8, 2025

Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class is a warning, and only appears in streamlit. I'm guessing it's because streamlit reloads everything, but I haven't looked into it deeply. Things still work fine with this warning, though.

Your traceback shows a separate error -

File "/root/anaconda3/envs/marker/lib/python3.10/site-packages/PIL/ImageFont.py", line 257, in init
self.font = core.getfont(
OSError: locations (loca) table missing

This is related to missing the debug image fonts. It will only appear when you check debug. I can look into this.

@flight505
Copy link

@VikParuchuri this is also rellated to running Marker in Streamlit. WARNING streamlit.runtime.scriptrunner_utils.script_run_context: Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode. While the application and Marker is working as it should the warning is repeated 50 times and could not be suppressed. And also this warning -> WARNING streamlit:
Warning: to view a Streamlit app on a browser, use Streamlit in a file and
run it with the following command:

streamlit run [FILE_NAME] [ARGUMENTS] 

While I normally from streamlit from my IDE I tested in Iterm2 still, persists and using uv run app.py && streamlit run app.py creating a temp env and running directly in that. It is not causing any problems directly. I just cant track it down, are you familier with any of these issues?

class MarkerConverter(PDFConverter):
   """PDF converter using Marker"""
   
   def __init__(self):
       """Initialize Marker converter"""
       try:
           from marker.converters.pdf import PdfConverter
           from marker.models import create_model_dict
           from marker.config.parser import ConfigParser
           
           # Configure Marker settings with enhanced equation detection
           config = {
               "output_format": "markdown",
               "layout_analysis": True,
               "detect_equations": True,
               "equation_detection_confidence": 0.3,
               "detect_inline_equations": True,
               "detect_tables": True,
               "detect_lists": True,
               "detect_code_blocks": True,
               "detect_footnotes": True,
               "equation_output": "latex",
               "preserve_math": True,
               "equation_detection_mode": "aggressive",
               "equation_context_window": 3,
               "equation_pattern_matching": True,
               "equation_symbol_extraction": True,
               
               # Enhanced header handling
               "header_detection": {
                   "enabled": True,
                   "style": "atx",  # Use # style headers
                   "levels": {
                       "title": 1,    # Title uses single #
                       "section": 2,   # Sections use ##
                       "subsection": 3 # Subsections use ###
                   },
                   "remove_duplicate_markers": True
               },
               
               # Enhanced list handling
               "list_detection": {
                   "enabled": True,
                   "unordered_marker": "-",  # Use - for unordered lists
                   "ordered_marker": "1.",   # Use 1. for ordered lists
                   "preserve_numbers": True,  # Keep original list numbers
                   "indent_spaces": 2        # Use 2 spaces for indentation
               },
               
               # Layout and formatting
               "layout": {
                   "paragraph_breaks": True,
                   "line_spacing": 2,
                   "remove_redundant_whitespace": True,
                   "preserve_line_breaks": True,
                   "preserve_blank_lines": True
               },
               
               # Content preservation
               "preserve": {
                   "links": True,
                   "tables": True,
                   "images": True,
                   "footnotes": True,
                   "formatting": True,
                   "lists": True,
                   "headers": True
               },
               
               # Output settings
               "output": {
                   "format": "markdown",
                   "save_markdown": True,
                   "save_text": True,
                   "markdown_ext": ".md",
                   "text_ext": ".txt"
               }
           }
           
           config_parser = ConfigParser(config)
           
           # Initialize converter with config
           self._converter = PdfConverter(
               config=config_parser.generate_config_dict(),
               artifact_dict=create_model_dict(),
               processor_list=config_parser.get_processors(),
               renderer=config_parser.get_renderer()
           )
           
           logger.info("Marker initialized with optimized settings")
           
       except Exception as e:
           logger.error(f"Failed to initialize Marker: {str(e)}")
           print(colored(f"⚠️ Failed to initialize Marker: {str(e)}", "yellow"))
           raise
   
   def extract_text(self, file_path: str) -> str:
       """Extract text with semantic structure preservation"""
       try:
           # Process PDF with Marker
           rendered = self._converter(file_path)
           
           # Save markdown file
           markdown_path = str(Path(file_path).with_suffix('.md'))
           
           # Extract text from rendered output
           if hasattr(rendered, 'markdown'):
               text = rendered.markdown
               # Save markdown content ... 

@alibabadoufu
Copy link

Same here. I am getting the same error

@vandant1
Copy link

vandant1 commented Jan 8, 2025

Same! I too get the same error while executing the the streamlit code. But here along with that I get following error and not understand why is it getting ?Error during detection: operands could not be broadcast together with shapes (1,4) () (3,)

@Daryl149
Copy link

I got this too, it seems marker and streamlit do not work well together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests