-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the default Jieba dict for Chinese search if not set #13005
Use the default Jieba dict for Chinese search if not set #13005
Conversation
AFAICT, this would leave a file descriptor opened ( |
@picnixz yes, because the default dict is installed with jieba, so we can directly get its path using |
sphinx/search/zh.py
Outdated
@@ -234,7 +234,10 @@ def __init__(self, options: dict[str, str]) -> None: | |||
|
|||
def init(self, options: dict[str, str]) -> None: | |||
if JIEBA: | |||
dict_path = options.get('dict') | |||
default_dict_path = os.path.join( | |||
os.path.dirname(jieba.__file__), 'dict.txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the jieba constant of dict.txt
instead? (I think there is a constant with that name) or is this constant not meant to be publicly available?
If we want to only rely on public API and not on dunder attributes, we could open the file using a with
statement, get the filename and then close it. It would probably be costly on Windows machines because open() calls can be up to 15x times slower compared to Linux but that would probably the second cleanest way to simplify maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can use
jieba.DEFAULT_DICT_NAME
to get the default dict name, but without its parent directory path. - We can use
inspect.getfile(jieba)
to get the path of the modulejieba
. - Finally we can use
os.path.join(os.path.dirname(inspect.getfile(jieba)), jieba.DEFAULT_DICT_NAME)
to get the path of the path of default dict file, which is not relying on dunder attributes.
One more thing, I find the inspect
provide a 'public API' inspect.getabsfile
which can get the absolute path of specify module, but it has not documented for 10 years. cpython#56526
If a not documented API is not a public API, it's better to use inspect.getfile
instead, I think.
But at the same time, Github Copilot says that inspect.getfile
may return a relative path, but I have not found any evidence on the Internet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified my code, please review again. @picnixz
sphinx/search/zh.py
Outdated
@@ -2,6 +2,7 @@ | |||
|
|||
from __future__ import annotations | |||
|
|||
import inspect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this import in the function it is being used just to save a bit of import time (unless this is already imported by some other dependency)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use inspect here? It's a very heavy package, we can just use .__file__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use a public API but maybe it's an overkill (my bad since it was my suggestion :')).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just revert to 46753b9 and dynamically get the dict file using jieba.DEFAULT_DICT_NAME
is enough, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use inspect here? It's a very heavy package, we can just use
.__file__
?
Should I modify the code to meet it?
Can we have this import in the function it is being used just to save a bit of import time (unless this is already imported by some other dependency)?
May be that's a compromise, but I already have no better idea about how to make a balance between using public API and avoiding importing a heavy package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use the dunder attribute directly. Sorry for the back and forth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use the dunder attribute directly. Sorry for the back and forth
Done! Please review it again.
This reverts commit 7f05775.
Subject: if user do not set the dict option, fallback to the default dict provided by jieba
Feature or Bugfix
Purpose
html_search_language
value iszh
Detail
jieba.get_dict_file().name
Relates