-
Notifications
You must be signed in to change notification settings - Fork 18
Description
I have two cases where shortcode-core plugins fails when generating search index for tntsearch plugin.
sh-5.1$ bin/plugin tntsearch index
Re-indexing
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php on line 339
Based on previous issues in #53 and the code in thunderer/Shortcode#71 I have prepared two reproducible test cases https://1drv.ms/f/s!AgnMn-haWyFrkcN50beuEO4A0m6PQw?e=GjwcEV
Test case 1 - HTML content
Test case 2 - Markdown content
From the Xdebug traces provided you will see that preg_match_all() statement in Thunderer Shortcode library uses almost 30MB for test case 1 parsing. For test case 2 it is almost 80MB!
I'm not sure why in one case TNT Search command line is parsing our page as HTML, but in other case it is parsing it as Markdown. All of our pages are stored as Markdown files on disk. Maybe it is something to do with HTML cache.
Anyway, this is what I see in full Xdebug trace when running "bin/plugin tntsearch index", so minimal reproducible cases in ZIP files are prepared accordingly.
Snippet from full Xdebug session of HTML page parsing:
34.0595 96308304 -> preg_match_all($pattern = '~((?<string>\\\\.|(?:(?!\\[|\\]|\\/|\\=|\\"|\\s+).)+)|(?<ws>\\s+)|(?<marker>\\/)|(?<delimiter>\\")|(?<separator>\\=)|(?<open>\\[)|(?<close>\\]))~us', $subject = '<table>\n<thead>\n<tr>\n<th>Užduotis</th>\n<th>Aprašymas</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><a href="https://gidas.rivile.lt/rivile_akademija/rivile_gama/x_pamoka/darbuotoju_aprasymas_avanso_ismokejimas#1-uzduotis" target="_blank" rel="nofollow noopener noreferrer" class="external-link no-image">1 užduotis</a></td>\n<td>Kalendorius. Naujo kalendoriaus sukūrimas ir pildymas.<br/>Prieššventinės dienos sutrumpinimas</td>\n</tr>\n<tr>\n<td><a href="https://gidas.rivile.lt/rivile_akademija/rivile_gama/x_pamoka/'..., $matches = NULL, $flags = 258) /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:339
34.0672 126197600 -> preg_last_error() /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:340
Snippet from full Xdebug session of Markdown page parsing:
21.4159 83613968 -> preg_match_all($pattern = '~((?<string>\\\\.|(?:(?!\\[|\\]|\\/|\\=|\\"|\\s+).)+)|(?<ws>\\s+)|(?<marker>\\/)|(?<delimiter>\\")|(?<separator>\\=)|(?<open>\\[)|(?<close>\\]))~us', $subject = '| Kodas | Pavadinimas |\n| ------------------------------------------------------------ | --------------------------------------------------------- |\n| [I01_DKZR](#i01_dkzr-dk-žurnalų-sąrašas) | DK žurnalų sąrašas |\n| [I02_DKH](#i02_dkh-dk-hederis) | DK hederis |\n| [I'..., $matches = NULL, $flags = 258) /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:339
21.4462 131903672
TRACE END [2024-04-18 08:57:42.623960]
Sadly full Xdebug trace is very big so it would be difficult to share it.
I hope it is enough information to fix this.