Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据根据schema切分失败 #625

Open
luyi661 opened this issue Jan 10, 2025 · 0 comments
Open

数据根据schema切分失败 #625

luyi661 opened this issue Jan 10, 2025 · 0 comments
Labels
question Further information is requested

Comments

@luyi661
Copy link

luyi661 commented Jan 10, 2025

您好,我在使用ie2instruction/convert_func.py 进行数据转换时,没有按照预期进行切分。
我的命令为
python ie2instruction/convert_func.py
--src_path data/EEA/sample_test.json
--tgt_path data/EEA/test.json
--schema_path data/EEA/schema.json
--language zh
--task EEA
--split_num 4
--split test

我的sample_test.json按照DeepKE/example/lm/InstructKGC/data/EEA/sample.json构建
而生成的数据并没有根据split_num 4进行切分(可以生成正确格式的test.json,只是没有切分)

我的schema.json按照DeepKE/example/lm/InstructKGC/data/EEA/schema.json构建,内容如下

["财经/金融-抵押/借款/借贷"]
["证券代码","公告日期","公告类型","借款公司名称","借款公司与上市公司关系","贷款银行详情","是否银团","贷款银行","贷款银行数","银行所在地","是否签约","贷款进程","签约日期","币种编码","币种","贷款金额上限","贷款金额下限","贷款期限","贷款起始日期","贷款结束日期","基准利率(%)","贷款利率(%)","贷款浮动利率","贷款类型","是否有抵押质押物","资金投向","公告标题","披露情况"]
{"财经/金融-抵押/借款/借贷": ["证券代码","公告日期","公告类型","借款公司名称","借款公司与上市公司关系","贷款银行详情","是否银团","贷款银行","贷款银行数","银行所在地","是否签约","贷款进程","签约日期","币种编码","币种","贷款金额上限","贷款金额下限","贷款期限","贷款起始日期","贷款结束日期","基准利率(%)","贷款利率(%)","贷款浮动利率","贷款类型","是否有抵押质押物","资金投向","公告标题","披露情况"]}

我在convert_func.py的切分函数中进行打印,代码如下

def multischema_split_by_num_test(schemas, split_num=4):
    print(schemas)
    print("len",len(schemas))
    if len(schemas) < split_num or split_num == -1:
        return [schemas, ]

    negative_length = max(len(schemas) // split_num, 1) * split_num
    total_schemas = []
    for i in range(0, negative_length, split_num):
        total_schemas.append(schemas[i:i+split_num])

    remain_len = max(1, split_num // 2)
    if len(schemas) - negative_length >= remain_len:
        total_schemas.append(schemas[negative_length:])
    else:
        total_schemas[-1].extend(schemas[negative_length:])
    return total_schemas

结果如下,长度为1,所以没有进行切分。请问这是哪里出了问题
image

@luyi661 luyi661 added the question Further information is requested label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant