Skip to content

RegularParser fails with memory exhausted errors #121

@ViliusS

Description

@ViliusS

I have two cases where shortcode-core plugins fails when generating search index for tntsearch plugin.

sh-5.1$ bin/plugin tntsearch index

Re-indexing

PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php on line 339

Based on previous issues in #53 and the code in thunderer/Shortcode#71 I have prepared two reproducible test cases https://1drv.ms/f/s!AgnMn-haWyFrkcN50beuEO4A0m6PQw?e=GjwcEV

Test case 1 - HTML content
Test case 2 - Markdown content

From the Xdebug traces provided you will see that preg_match_all() statement in Thunderer Shortcode library uses almost 30MB for test case 1 parsing. For test case 2 it is almost 80MB!

I'm not sure why in one case TNT Search command line is parsing our page as HTML, but in other case it is parsing it as Markdown. All of our pages are stored as Markdown files on disk. Maybe it is something to do with HTML cache.
Anyway, this is what I see in full Xdebug trace when running "bin/plugin tntsearch index", so minimal reproducible cases in ZIP files are prepared accordingly.

Snippet from full Xdebug session of HTML page parsing:

   34.0595   96308304                                                         -> preg_match_all($pattern = '~((?<string>\\\\.|(?:(?!\\[|\\]|\\/|\\=|\\"|\\s+).)+)|(?<ws>\\s+)|(?<marker>\\/)|(?<delimiter>\\")|(?<separator>\\=)|(?<open>\\[)|(?<close>\\]))~us', $subject = '<table>\n<thead>\n<tr>\n<th>Užduotis</th>\n<th>Aprašymas</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><a href="https://gidas.rivile.lt/rivile_akademija/rivile_gama/x_pamoka/darbuotoju_aprasymas_avanso_ismokejimas#1-uzduotis" target="_blank" rel="nofollow noopener noreferrer" class="external-link no-image">1 užduotis</a></td>\n<td>Kalendorius. Naujo kalendoriaus sukūrimas ir pildymas.<br/>Prieššventinės dienos sutrumpinimas</td>\n</tr>\n<tr>\n<td><a href="https://gidas.rivile.lt/rivile_akademija/rivile_gama/x_pamoka/'..., $matches = NULL, $flags = 258) /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:339
   34.0672  126197600                                                         -> preg_last_error() /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:340
  

Snippet from full Xdebug session of Markdown page parsing:

   21.4159   83613968                                                           -> preg_match_all($pattern = '~((?<string>\\\\.|(?:(?!\\[|\\]|\\/|\\=|\\"|\\s+).)+)|(?<ws>\\s+)|(?<marker>\\/)|(?<delimiter>\\")|(?<separator>\\=)|(?<open>\\[)|(?<close>\\]))~us', $subject = '| Kodas                                                        | Pavadinimas                                               |\n| ------------------------------------------------------------ | --------------------------------------------------------- |\n| [I01_DKZR](#i01_dkzr-dk-žurnalų-sąrašas)                     | DK žurnalų sąrašas                                        |\n| [I02_DKH](#i02_dkh-dk-hederis)                               | DK hederis                                                |\n| [I'..., $matches = NULL, $flags = 258) /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:339
   21.4462  131903672
TRACE END   [2024-04-18 08:57:42.623960]

Sadly full Xdebug trace is very big so it would be difficult to share it.

I hope it is enough information to fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions