Skip to content

翻译后出现大片空白,公式发生诡异翻转 #958

@ouhongtao2024

Description

@ouhongtao2024

在提问之前...

  • 我已经搜索了现有的 issues
  • 我在提问题之前至少花费了 5 分钟来思考和准备
  • 我已经认真且完整的阅读了 wiki
  • 我已经认真检查了问题和网络环境无关(包括但不限于Google不可用,模型下载失败)

使用的环境

- **OS** : windows11 x86 
- **python**:  3.11.9
- **pdf2zh** : 1.9.9
- **pip_list** :

Package                        Version
------------------------------ -----------
aiofiles                       23.2.1
annotated-types                0.7.0
anyio                          4.6.2.post1
azure-ai-translation-text      1.0.1
azure-core                     1.32.0
BabelDOC                       0.2.30
bitarray                       3.3.0
bitstring                      4.3.1
certifi                        2024.8.30
cffi                           1.17.1
charset-normalizer             3.4.0
click                          8.1.7
click-default-group            1.2.4
colorama                       0.4.6
coloredlogs                    15.0.1
ConfigArgParse                 1.7
contourpy                      1.3.2
cryptography                   44.0.0
cycler                         0.12.1
decorator                      5.1.1
deepl                          1.18.0
Deprecated                     1.2.18
distro                         1.9.0
docformatter                   1.7.5
et_xmlfile                     2.0.0
fastapi                        0.115.5
ffmpy                          0.4.0
filelock                       3.16.1
flatbuffers                    24.3.25
fonttools                      4.56.0
freetype-py                    2.5.1
fsspec                         2024.10.0
gradio                         5.7.0
gradio_client                  1.5.0
gradio_pdf                     0.0.21
h11                            0.14.0
httpcore                       1.0.7
httpx                          0.27.2
huggingface-hub                0.30.1
humanfriendly                  10.0
idna                           3.10
imageio                        2.37.0
isodate                        0.7.2
Jinja2                         3.1.4
jiter                          0.8.0
kiwisolver                     1.4.8
lazy_loader                    0.4
Levenshtein                    0.27.1
lxml                           5.3.1
markdown-it-py                 3.0.0
MarkupSafe                     2.1.5
matplotlib                     3.10.1
mdurl                          0.1.2
mpmath                         1.3.0
msgpack                        1.1.0
networkx                       3.4.2
numpy                          2.2.4
ollama                         0.4.1
onnx                           1.17.0
onnxruntime                    1.20.1
openai                         1.70.0
opencv-python                  4.11.0.86
opencv-python-headless         4.10.0.84
openpyxl                       3.1.5
orjson                         3.10.16
packaging                      24.2
pandas                         2.2.3
pdf2zh                         1.9.9
pdfminer.six                   20250416
peewee                         3.17.9
pikepdf                        9.5.2
pillow                         11.0.0
pip                            25.0.1
protobuf                       5.29.0
psutil                         7.0.0
pyclipper                      1.3.0.post6
pycosat                        0.6.6
pycparser                      2.22
pydantic                       2.11.1
pydantic_core                  2.33.0
pydub                          0.25.1
pygame                         2.6.1
Pygments                       2.18.0
PyMuPDF                        1.25.2
pyparsing                      3.2.3
pyreadline3                    3.5.4
PySocks                        1.7.1
pyte                           0.8.2
python-dateutil                2.9.0.post0
python-Levenshtein             0.27.1
python-multipart               0.0.12
pytz                           2024.2
PyYAML                         6.0.2
RapidFuzz                      3.12.2
rapidocr-onnxruntime           1.4.4
regex                          2024.11.6
requests                       2.32.3
rich                           13.9.4
ruff                           0.8.0
safehttpx                      0.1.1
scikit-image                   0.25.2
scipy                          1.15.2
semantic-version               2.10.0
shapely                        2.0.7
shellingham                    1.5.4
six                            1.16.0
sniffio                        1.3.1
socksio                        1.0.0
starlette                      0.41.3
sympy                          1.13.3
tenacity                       9.0.0
tencentcloud-sdk-python        3.0.1285
tencentcloud-sdk-python-common 3.0.1353
tencentcloud-sdk-python-tmt    3.0.1353
tifffile                       2025.3.30
tiktoken                       0.9.0
toml                           0.10.2
tomlkit                        0.12.0
toposort                       1.10
tqdm                           4.67.1
typer                          0.13.1
typing_extensions              4.12.2
typing-inspection              0.4.0
tzdata                         2024.2
untokenize                     0.1.1
urllib3                        2.2.3
uvicorn                        0.32.1
wcwidth                        0.2.13
websockets                     12.0
win_unicode_console            0.5
wrapt                          1.17.2
xinference-client              1.4.0
xsdata                         24.12

请选择安装方式

pip

描述你的问题

  • 一开始我是使用deepseek代理翻译,出现大段空白,且公式诡异翻转
    后面改用默认的谷歌的翻译,亦是如此,但是诡异的是两次尝试都没有终端信息报错

  • 目前已更新到最新版,仍然出错

  • 之前看到其他的因为PDF浏览器打开不当导致的bug,所以我自己尝试使用多种PDF浏览器打开之,但是用了Edge\SumatraPDF\TEXworks打开都无济于事

  • 后面请了朋友用Mac电脑上的Chorme浏览器打开依旧不行

如何复现

执行

pdf2zh .\icml01-ffq.pdf

或者

pdf2zh .\icml01-ffq.pdf -s deepseek

预期行为

希望看到正常的文字翻译,公式错乱倒是小事

相关 Logs

在PowerShell 7.5.1中运行得到:

PRTS G:\DownLoad_From_Edge\PDFs_From_Internet 20:15 ◆ <◆> System  10s
❖ pdf2zh .\icml01-ffq.pdf -s deepseek
not in git repo
Namespace(files=['.\\icml01-ffq.pdf'], debug=False, pages=None, vfont='', vchar='', lang_in='en', lang_out='zh', service='deepseek', output='', thread=4, interactive=False, share=False, flask=False, celery=False, authorized=None, prompt=None, compatible=False, onnx=None, serverport=None, dir=False, config=None, babeldoc=False, skip_subset_fonts=False, ignore_cache=False, mcp=False, sse=False)
[05/26/25 20:16:10] INFO     INFO:pdf2zh.high_level:use font:                                          high_level.py:423
                             C:/Users/区涛/.cache/babeldoc/fonts/SourceHanSerifCN-Regular.ttf
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00,  1.04s/it]
PRTS G:\DownLoad_From_Edge\PDFs_From_Internet 20:16 ◆ <◆> System  24s
❖ pdf2zh .\icml01-ffq.pdf
not in git repo
Namespace(files=['.\\icml01-ffq.pdf'], debug=False, pages=None, vfont='', vchar='', lang_in='en', lang_out='zh', service='google', output='', thread=4, interactive=False, share=False, flask=False, celery=False, authorized=None, prompt=None, compatible=False, onnx=None, serverport=None, dir=False, config=None, babeldoc=False, skip_subset_fonts=False, ignore_cache=False, mcp=False, sse=False)
[05/26/25 20:17:10] INFO     INFO:pdf2zh.high_level:use font:                                          high_level.py:423
                             C:/Users/区涛/.cache/babeldoc/fonts/SourceHanSerifCN-Regular.ttf
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:49<00:00,  6.17s/it]

原始PDF文件

icml01-ffq_raw.pdf

这里附上源下载网址:https://www.researchgate.net/publication/2933305_Friend-or-Foe_Q-learning_in_General-Sum_Games

还有别的吗?

icml01-ffq-mono.pdf

icml01-ffq-dual.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions