close

通常使用PHPWord template,要增加一連串的列表資料時,先cloneRow目標列,再使用setValue把列表資料填進去

https://phpword.readthedocs.io/en/latest/templates-processing.html

但是! 這個function的效能極度差,處理100筆資料就要耗時20幾秒,10000筆要2000秒=33.3分鐘。

 

找了好久只找到這個討論串

https://github.com/PHPOffice/PHPWord/issues/513

雖然改成str_replace但是速度還是沒變。

下面有位仁兄提出解法,但官方並沒有收錄進去,只能自己修改。

 

以下為解法:

--

First of all, thanks to the authors for this great project and sorry for my aproximate english.

I think there is a design problem that came up when treating great volumes of data (by the way, treating hundreeds or thusand of lines can be common in many tasks).

From my point of view the problem is that values add added after the rows have been cloned. The problem for me is that each time you add (clone) a row, the porcessing time grows two times:

There is one more search/replace operation to do.
The amount of XML data to treat for every search/replace gets bigger
this means that if the number of rows you want to treat is n, the processing time will be womething like n^2 (n*n).
using str_replace, or give search/replace values as arrays speeds up the thing a little, but the processing time will still grow by someting like n^2 (n*n)
the solution might be to do the search/replace while cloning, that way, the search / replace operations will be done only on the cloned part, not on the whole document. The processing time grows by n, not by n^2.
This way, it can generate thusands of rows in less than a second...

I modified TemplateProcessor::cloneRow so it can accept an array with values a argument and do the replacement work while cloning.


public function cloneRow($search, $numberOfClones, $arRepl="", $limit = self::MAXIMUM_REPLACEMENTS_DEFAULT)
{
if ('${' !== substr($search, 0, 2) && '}' !== substr($search, -1)) {
$search = '${' . $search . '}';
}

$tagPos = strpos($this->tempDocumentMainPart, $search);
if (!$tagPos) {
throw new Exception("Can not clone row, template variable not found or variable contains markup.");
}

$rowStart = $this->findRowStart($tagPos);
$rowEnd = $this->findRowEnd($tagPos);
$xmlRow = $this->getSlice($rowStart, $rowEnd);

// Check if there's a cell spanning multiple rows.
if (preg_match('#<w:vMerge w:val="restart"/>#', $xmlRow)) {
// $extraRowStart = $rowEnd;
$extraRowEnd = $rowEnd;
while (true) {
$extraRowStart = $this->findRowStart($extraRowEnd + 1);
$extraRowEnd = $this->findRowEnd($extraRowEnd + 1);

// If extraRowEnd is lower then 7, there was no next row found.
if ($extraRowEnd < 7) {
break;
}

// If tmpXmlRow doesn't contain continue, this row is no longer part of the spanned row.
$tmpXmlRow = $this->getSlice($extraRowStart, $extraRowEnd);
if (!preg_match('#<w:vMerge/>#', $tmpXmlRow) &&
!preg_match('#<w:vMerge w:val="continue" />#', $tmpXmlRow)) {
break;
}
// This row was a spanned row, update $rowEnd and search for the next row.
$rowEnd = $extraRowEnd;
}
$xmlRow = $this->getSlice($rowStart, $rowEnd);
}

$result = $this->getSlice(0, $rowStart);

if(is_array($arRepl))
for ($i = 1; $i <= $numberOfClones; $i++) {
$ar_search=array_keys($arRepl[$i]);
$ar_repl=array_values($arRepl[$i]);

foreach ($ar_search as &$item) $item=self::ensureMacroCompleted($item);
foreach ($ar_repl as &$item) $item = self::ensureUtf8Encoded($item);

$result .= $this->setValueForPart($ar_search, $ar_repl, $xmlRow, $limit);
//$result .= str_replace($ar_search, $ar_repl, $xmlRow);
}
else
for ($i = 1; $i <= $numberOfClones; $i++) {
$result .= preg_replace('/\$\{(.*?)\}/', '\${\\1#' . $i . '}', $xmlRow);
}

$result .= $this->getSlice($rowEnd);

$this->tempDocumentMainPart = $result;
}

Just call it by passing an array of arrays as third parameter:

$array_row_values=array(
1 => array("row1" => "value 1", "row2" => "value 2" /*etc...*/ ), //values for first line
2 => array("row1" => "value 1", "row2" => "value 2" /*etc...*/ ), //values for second line
3 => array("row1" => "value 1", "row2" => "value 2" /*etc...*/ ), //values for third line
/* ect.. */
);

$templateProcessor->cloneRow('rowFieldName', 1000 /*or greater*/, $array_row_values);

and the replacement will be directly made while cloning.

fourth parameter (limit) is totaly optional.

if you call it the normal way (only two parameter), i will still beleive the normal way (just clone row, down't apply values)

I tested it with the following code (using the template form the sample 7)


echo date('H:i:s'), ' Creating new TemplateProcessor instance...';
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('Sample_07.docx');

// Variables on different parts of document
$templateProcessor->setValue('weekday', date('l')); // On section/content
$templateProcessor->setValue('time', date('H:i')); // On footer
$templateProcessor->setValue('serverName', realpath(__DIR__)); // On header

$nbRSimple=1000;
$nbRcomplexe=1000;


$arValues_ClonesSimples=array();
for ($i=1; $i<=$nbRSimple; $i++)
$arValues_ClonesSimples[$i]=array(
"rowValue" => rand(10000000, 99999999),
"rowNumber" => $i
);

// Simple table
$templateProcessor->cloneRow('rowValue', $nbRSimple, $arValues_ClonesSimples);

$arValues_ClonesComplexes=array();
for ($i=1; $i<=$nbRSimple; $i++)
$arValues_ClonesComplexes[$i]=array(
"userId" => rand(10000000, 99999999),
"userFirstName" => rand(10000000, 99999999),
"userName" => rand(10000000, 99999999),
"userPhone" => rand(10000000, 99999999),
);

$templateProcessor->cloneRow('userId', $nbRcomplexe, $arValues_ClonesComplexes);

echo date('H:i:s'), ' Saving the result document...';
$templateProcessor->saveAs('Sample_07_out.docx');

--

arrow
arrow
    全站熱搜
    創作者介紹
    創作者 dizzy03 的頭像
    dizzy03

    碎碎念

    dizzy03 發表在 痞客邦 留言(0) 人氣()