ILIAS  release_5-2 Revision v5.2.25-18-g3f80b828510
PHPExcel_Reader_HTML Class Reference
+ Inheritance diagram for PHPExcel_Reader_HTML:
+ Collaboration diagram for PHPExcel_Reader_HTML:

Public Member Functions

 __construct ()
 Create a new PHPExcel_Reader_HTML. More...
 
 load ($pFilename)
 Loads PHPExcel from file. More...
 
 setInputEncoding ($pValue='ANSI')
 Set input encoding. More...
 
 getInputEncoding ()
 Get input encoding. More...
 
 loadIntoExisting ($pFilename, PHPExcel $objPHPExcel)
 Loads PHPExcel from file into PHPExcel instance. More...
 
 getSheetIndex ()
 Get sheet index. More...
 
 setSheetIndex ($pValue=0)
 Set sheet index. More...
 
 securityScan ($xml)
 Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks. More...
 
- Public Member Functions inherited from PHPExcel_Reader_Abstract
 getReadDataOnly ()
 Read data only? If this is true, then the Reader will only read data values for cells, it will not read any formatting information. More...
 
 setReadDataOnly ($pValue=FALSE)
 Set read data only Set to true, to advise the Reader only to read data values for cells, and to ignore any formatting information. More...
 
 getIncludeCharts ()
 Read charts in workbook? If this is true, then the Reader will include any charts that exist in the workbook. More...
 
 setIncludeCharts ($pValue=FALSE)
 Set read charts in workbook Set to true, to advise the Reader to include any charts that exist in the workbook. More...
 
 getLoadSheetsOnly ()
 Get which sheets to load Returns either an array of worksheet names (the list of worksheets that should be loaded), or a null indicating that all worksheets in the workbook should be loaded. More...
 
 setLoadSheetsOnly ($value=NULL)
 Set which sheets to load. More...
 
 setLoadAllSheets ()
 Set all sheets to load Tells the Reader to load all worksheets from the workbook. More...
 
 getReadFilter ()
 Read filter. More...
 
 setReadFilter (PHPExcel_Reader_IReadFilter $pValue)
 Set read filter. More...
 
 canRead ($pFilename)
 Can the current PHPExcel_Reader_IReader read the file? More...
 
 securityScan ($xml)
 Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks. More...
 
 securityScanFile ($filestream)
 Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks. More...
 
 canRead ($pFilename)
 Can the current PHPExcel_Reader_IReader read the file? More...
 
 load ($pFilename)
 

Protected Member Functions

 _isValidFormat ()
 Validate that the current file is an HTML file. More...
 
 _setTableStartColumn ($column)
 
 _getTableStartColumn ()
 
 _releaseTableStartColumn ()
 
 _flushCell ($sheet, $column, $row, &$cellContent)
 
 _processDomElement (DOMNode $element, $sheet, &$row, &$column, &$cellContent, $format=null)
 
- Protected Member Functions inherited from PHPExcel_Reader_Abstract
 _openFile ($pFilename)
 Open file for reading. More...
 

Protected Attributes

 $_inputEncoding = 'ANSI'
 
 $_sheetIndex = 0
 
 $_formats
 
 $rowspan = array()
 
 $_dataArray = array()
 
 $_tableLevel = 0
 
 $_nestedColumn = array('A')
 
- Protected Attributes inherited from PHPExcel_Reader_Abstract
 $_readDataOnly = FALSE
 
 $_includeCharts = FALSE
 
 $_loadSheetsOnly = NULL
 
 $_readFilter = NULL
 
 $_fileHandle = NULL
 

Detailed Description

Definition at line 44 of file HTML.php.

Constructor & Destructor Documentation

◆ __construct()

PHPExcel_Reader_HTML::__construct ( )

Create a new PHPExcel_Reader_HTML.

Definition at line 109 of file HTML.php.

Member Function Documentation

◆ _flushCell()

PHPExcel_Reader_HTML::_flushCell (   $sheet,
  $column,
  $row,
$cellContent 
)
protected

Definition at line 196 of file HTML.php.

197 {
198 if (is_string($cellContent)) {
199 // Simple String content
200 if (trim($cellContent) > '') {
201 // Only actually write it if there's content in the string
202// echo 'FLUSH CELL: ' , $column , $row , ' => ' , $cellContent , '<br />';
203 // Write to worksheet to be done here...
204 // ... we return the cell so we can mess about with styles more easily
205 $sheet->setCellValue($column . $row, $cellContent, true);
206 $this->_dataArray[$row][$column] = $cellContent;
207 }
208 } else {
209 // We have a Rich Text run
210 // TODO
211 $this->_dataArray[$row][$column] = 'RICH TEXT: ' . $cellContent;
212 }
213 $cellContent = (string) '';
214 }
$column
Definition: 39dropdown.php:62

References $column, and $row.

Referenced by _processDomElement().

+ Here is the caller graph for this function:

◆ _getTableStartColumn()

PHPExcel_Reader_HTML::_getTableStartColumn ( )
protected

Definition at line 184 of file HTML.php.

185 {
186 return $this->_nestedColumn[$this->_tableLevel];
187 }

References $_tableLevel.

Referenced by _processDomElement().

+ Here is the caller graph for this function:

◆ _isValidFormat()

PHPExcel_Reader_HTML::_isValidFormat ( )
protected

Validate that the current file is an HTML file.

Returns
boolean

Definition at line 119 of file HTML.php.

120 {
121 // Reading 2048 bytes should be enough to validate that the format is HTML
122 $data = fread($this->_fileHandle, 2048);
123 if ((strpos($data, '<') !== FALSE) &&
124 (strlen($data) !== strlen(strip_tags($data)))) {
125 return TRUE;
126 }
127
128 return FALSE;
129 }

References $data.

Referenced by loadIntoExisting().

+ Here is the caller graph for this function:

◆ _processDomElement()

PHPExcel_Reader_HTML::_processDomElement ( DOMNode  $element,
  $sheet,
$row,
$column,
$cellContent,
  $format = null 
)
protected

Definition at line 216 of file HTML.php.

217 {
218 foreach ($element->childNodes as $child) {
219 if ($child instanceof DOMText) {
220 $domText = preg_replace('/\s+/u', ' ', trim($child->nodeValue));
221 if (is_string($cellContent)) {
222 // simply append the text if the cell content is a plain text string
223 $cellContent .= $domText;
224 } else {
225 // but if we have a rich text run instead, we need to append it correctly
226 // TODO
227 }
228 } elseif ($child instanceof DOMElement) {
229// echo '<b>DOM ELEMENT: </b>' , strtoupper($child->nodeName) , '<br />';
230
231 $attributeArray = array();
232 foreach ($child->attributes as $attribute) {
233// echo '<b>ATTRIBUTE: </b>' , $attribute->name , ' => ' , $attribute->value , '<br />';
234 $attributeArray[$attribute->name] = $attribute->value;
235 }
236
237 switch ($child->nodeName) {
238 case 'meta' :
239 foreach ($attributeArray as $attributeName => $attributeValue) {
240 switch ($attributeName) {
241 case 'content':
242 // TODO
243 // Extract character set, so we can convert to UTF-8 if required
244 break;
245 }
246 }
247 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
248 break;
249 case 'title' :
250 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
251 $sheet->setTitle($cellContent);
252 $cellContent = '';
253 break;
254 case 'span' :
255 case 'div' :
256 case 'font' :
257 case 'i' :
258 case 'em' :
259 case 'strong':
260 case 'b' :
261// echo 'STYLING, SPAN OR DIV<br />';
262 if ($cellContent > '')
263 $cellContent .= ' ';
264 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
265 if ($cellContent > '')
266 $cellContent .= ' ';
267// echo 'END OF STYLING, SPAN OR DIV<br />';
268 break;
269 case 'hr' :
270 $this->_flushCell($sheet, $column, $row, $cellContent);
271 ++$row;
272 if (isset($this->_formats[$child->nodeName])) {
273 $sheet->getStyle($column . $row)->applyFromArray($this->_formats[$child->nodeName]);
274 } else {
275 $cellContent = '----------';
276 $this->_flushCell($sheet, $column, $row, $cellContent);
277 }
278 ++$row;
279 case 'br' :
280 if ($this->_tableLevel > 0) {
281 // If we're inside a table, replace with a \n
282 $cellContent .= "\n";
283 } else {
284 // Otherwise flush our existing content and move the row cursor on
285 $this->_flushCell($sheet, $column, $row, $cellContent);
286 ++$row;
287 }
288// echo 'HARD LINE BREAK: ' , '<br />';
289 break;
290 case 'a' :
291// echo 'START OF HYPERLINK: ' , '<br />';
292 foreach ($attributeArray as $attributeName => $attributeValue) {
293 switch ($attributeName) {
294 case 'href':
295// echo 'Link to ' , $attributeValue , '<br />';
296 $sheet->getCell($column . $row)->getHyperlink()->setUrl($attributeValue);
297 if (isset($this->_formats[$child->nodeName])) {
298 $sheet->getStyle($column . $row)->applyFromArray($this->_formats[$child->nodeName]);
299 }
300 break;
301 }
302 }
303 $cellContent .= ' ';
304 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
305// echo 'END OF HYPERLINK:' , '<br />';
306 break;
307 case 'h1' :
308 case 'h2' :
309 case 'h3' :
310 case 'h4' :
311 case 'h5' :
312 case 'h6' :
313 case 'ol' :
314 case 'ul' :
315 case 'p' :
316 if ($this->_tableLevel > 0) {
317 // If we're inside a table, replace with a \n
318 $cellContent .= "\n";
319// echo 'LIST ENTRY: ' , '<br />';
320 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
321// echo 'END OF LIST ENTRY:' , '<br />';
322 } else {
323 if ($cellContent > '') {
324 $this->_flushCell($sheet, $column, $row, $cellContent);
325 $row++;
326 }
327// echo 'START OF PARAGRAPH: ' , '<br />';
328 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
329// echo 'END OF PARAGRAPH:' , '<br />';
330 $this->_flushCell($sheet, $column, $row, $cellContent);
331
332 if (isset($this->_formats[$child->nodeName])) {
333 $sheet->getStyle($column . $row)->applyFromArray($this->_formats[$child->nodeName]);
334 }
335
336 $row++;
337 $column = 'A';
338 }
339 break;
340 case 'li' :
341 if ($this->_tableLevel > 0) {
342 // If we're inside a table, replace with a \n
343 $cellContent .= "\n";
344// echo 'LIST ENTRY: ' , '<br />';
345 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
346// echo 'END OF LIST ENTRY:' , '<br />';
347 } else {
348 if ($cellContent > '') {
349 $this->_flushCell($sheet, $column, $row, $cellContent);
350 }
351 ++$row;
352// echo 'LIST ENTRY: ' , '<br />';
353 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
354// echo 'END OF LIST ENTRY:' , '<br />';
355 $this->_flushCell($sheet, $column, $row, $cellContent);
356 $column = 'A';
357 }
358 break;
359 case 'table' :
360 $this->_flushCell($sheet, $column, $row, $cellContent);
362// echo 'START OF TABLE LEVEL ' , $this->_tableLevel , '<br />';
363 if ($this->_tableLevel > 1)
364 --$row;
365 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
366// echo 'END OF TABLE LEVEL ' , $this->_tableLevel , '<br />';
368 if ($this->_tableLevel > 1) {
369 ++$column;
370 } else {
371 ++$row;
372 }
373 break;
374 case 'thead' :
375 case 'tbody' :
376 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
377 break;
378 case 'tr' :
379 $column = $this->_getTableStartColumn();
380 $cellContent = '';
381// echo 'START OF TABLE ' , $this->_tableLevel , ' ROW<br />';
382 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
383 ++$row;
384// echo 'END OF TABLE ' , $this->_tableLevel , ' ROW<br />';
385 break;
386 case 'th' :
387 case 'td' :
388// echo 'START OF TABLE ' , $this->_tableLevel , ' CELL<br />';
389 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
390// echo 'END OF TABLE ' , $this->_tableLevel , ' CELL<br />';
391
392 while (isset($this->rowspan[$column . $row])) {
393 ++$column;
394 }
395
396 $this->_flushCell($sheet, $column, $row, $cellContent);
397
398// if (isset($attributeArray['style']) && !empty($attributeArray['style'])) {
399// $styleAry = $this->getPhpExcelStyleArray($attributeArray['style']);
400//
401// if (!empty($styleAry)) {
402// $sheet->getStyle($column . $row)->applyFromArray($styleAry);
403// }
404// }
405
406 if (isset($attributeArray['rowspan']) && isset($attributeArray['colspan'])) {
407 //create merging rowspan and colspan
408 $columnTo = $column;
409 for ($i = 0; $i < $attributeArray['colspan'] - 1; $i++) {
410 ++$columnTo;
411 }
412 $range = $column . $row . ':' . $columnTo . ($row + $attributeArray['rowspan'] - 1);
413 foreach (\PHPExcel_Cell::extractAllCellReferencesInRange($range) as $value) {
414 $this->rowspan[$value] = true;
415 }
416 $sheet->mergeCells($range);
417 $column = $columnTo;
418 } elseif (isset($attributeArray['rowspan'])) {
419 //create merging rowspan
420 $range = $column . $row . ':' . $column . ($row + $attributeArray['rowspan'] - 1);
421 foreach (\PHPExcel_Cell::extractAllCellReferencesInRange($range) as $value) {
422 $this->rowspan[$value] = true;
423 }
424 $sheet->mergeCells($range);
425 } elseif (isset($attributeArray['colspan'])) {
426 //create merging colspan
427 $columnTo = $column;
428 for ($i = 0; $i < $attributeArray['colspan'] - 1; $i++) {
429 ++$columnTo;
430 }
431 $sheet->mergeCells($column . $row . ':' . $columnTo . $row);
432 $column = $columnTo;
433 }
434 ++$column;
435 break;
436 case 'body' :
437 $row = 1;
438 $column = 'A';
439 $content = '';
440 $this->_tableLevel = 0;
441 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
442 break;
443 default:
444 $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
445 }
446 }
447 }
448 }
static extractAllCellReferencesInRange($pRange='A1')
Extract all cell references in range.
Definition: Cell.php:854
_processDomElement(DOMNode $element, $sheet, &$row, &$column, &$cellContent, $format=null)
Definition: HTML.php:216
_setTableStartColumn($column)
Definition: HTML.php:174
_releaseTableStartColumn()
Definition: HTML.php:189
_flushCell($sheet, $column, $row, &$cellContent)
Definition: HTML.php:196

References $column, $row, _flushCell(), _getTableStartColumn(), _processDomElement(), _releaseTableStartColumn(), _setTableStartColumn(), and PHPExcel_Cell\extractAllCellReferencesInRange().

Referenced by _processDomElement(), and loadIntoExisting().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _releaseTableStartColumn()

PHPExcel_Reader_HTML::_releaseTableStartColumn ( )
protected

Definition at line 189 of file HTML.php.

190 {
192
193 return array_pop($this->_nestedColumn);
194 }

References $_tableLevel.

Referenced by _processDomElement().

+ Here is the caller graph for this function:

◆ _setTableStartColumn()

PHPExcel_Reader_HTML::_setTableStartColumn (   $column)
protected

Definition at line 174 of file HTML.php.

175 {
176 if ($this->_tableLevel == 0)
177 $column = 'A';
179 $this->_nestedColumn[$this->_tableLevel] = $column;
180
181 return $this->_nestedColumn[$this->_tableLevel];
182 }

References $_tableLevel, and $column.

Referenced by _processDomElement().

+ Here is the caller graph for this function:

◆ getInputEncoding()

PHPExcel_Reader_HTML::getInputEncoding ( )

Get input encoding.

Returns
string

Definition at line 164 of file HTML.php.

165 {
167 }

References $_inputEncoding.

◆ getSheetIndex()

PHPExcel_Reader_HTML::getSheetIndex ( )

Get sheet index.

Returns
int

Definition at line 500 of file HTML.php.

501 {
502 return $this->_sheetIndex;
503 }

References $_sheetIndex.

◆ load()

PHPExcel_Reader_HTML::load (   $pFilename)

Loads PHPExcel from file.

Parameters
string$pFilename
Returns
PHPExcel
Exceptions
PHPExcel_Reader_Exception

Implements PHPExcel_Reader_IReader.

Definition at line 138 of file HTML.php.

139 {
140 // Create new PHPExcel
141 $objPHPExcel = new PHPExcel();
142
143 // Load into this instance
144 return $this->loadIntoExisting($pFilename, $objPHPExcel);
145 }
$objPHPExcel
loadIntoExisting($pFilename, PHPExcel $objPHPExcel)
Loads PHPExcel from file into PHPExcel instance.
Definition: HTML.php:458

References $objPHPExcel, and loadIntoExisting().

+ Here is the call graph for this function:

◆ loadIntoExisting()

PHPExcel_Reader_HTML::loadIntoExisting (   $pFilename,
PHPExcel  $objPHPExcel 
)

Loads PHPExcel from file into PHPExcel instance.

Parameters
string$pFilename
PHPExcel$objPHPExcel
Returns
PHPExcel
Exceptions
PHPExcel_Reader_Exception

Definition at line 458 of file HTML.php.

459 {
460 // Open file to validate
461 $this->_openFile($pFilename);
462 if (!$this->_isValidFormat()) {
463 fclose($this->_fileHandle);
464 throw new PHPExcel_Reader_Exception($pFilename . " is an Invalid HTML file.");
465 }
466 // Close after validating
467 fclose($this->_fileHandle);
468
469 // Create new PHPExcel
470 while ($objPHPExcel->getSheetCount() <= $this->_sheetIndex) {
471 $objPHPExcel->createSheet();
472 }
473 $objPHPExcel->setActiveSheetIndex($this->_sheetIndex);
474
475 // Create a new DOM object
476 $dom = new domDocument;
477 // Reload the HTML file into the DOM object
478 $loaded = $dom->loadHTML($this->securityScanFile($pFilename));
479 if ($loaded === FALSE) {
480 throw new PHPExcel_Reader_Exception('Failed to load ', $pFilename, ' as a DOM Document');
481 }
482
483 // Discard white space
484 $dom->preserveWhiteSpace = false;
485
486 $row = 0;
487 $column = 'A';
488 $content = '';
489 $this->_processDomElement($dom, $objPHPExcel->getActiveSheet(), $row, $column, $content);
490
491 // Return
492 return $objPHPExcel;
493 }
securityScanFile($filestream)
Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks.
Definition: Abstract.php:251
_openFile($pFilename)
Open file for reading.
Definition: Abstract.php:195
_isValidFormat()
Validate that the current file is an HTML file.
Definition: HTML.php:119

References $column, $objPHPExcel, $row, _isValidFormat(), PHPExcel_Reader_Abstract\_openFile(), _processDomElement(), and PHPExcel_Reader_Abstract\securityScanFile().

Referenced by load().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ securityScan()

PHPExcel_Reader_HTML::securityScan (   $xml)

Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks.

Parameters
string$xml
Exceptions
PHPExcel_Reader_Exception

Reimplemented from PHPExcel_Reader_Abstract.

Definition at line 524 of file HTML.php.

525 {
526 $pattern = '/\\0?' . implode('\\0?', str_split('<!ENTITY')) . '\\0?/';
527 if (preg_match($pattern, $xml)) {
528 throw new PHPExcel_Reader_Exception('Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
529 }
530 return $xml;
531 }

◆ setInputEncoding()

PHPExcel_Reader_HTML::setInputEncoding (   $pValue = 'ANSI')

Set input encoding.

Parameters
string$pValueInput encoding

Definition at line 152 of file HTML.php.

153 {
154 $this->_inputEncoding = $pValue;
155
156 return $this;
157 }

◆ setSheetIndex()

PHPExcel_Reader_HTML::setSheetIndex (   $pValue = 0)

Set sheet index.

Parameters
int$pValueSheet index
Returns
PHPExcel_Reader_HTML

Definition at line 511 of file HTML.php.

512 {
513 $this->_sheetIndex = $pValue;
514
515 return $this;
516 }

Field Documentation

◆ $_dataArray

PHPExcel_Reader_HTML::$_dataArray = array()
protected

Definition at line 170 of file HTML.php.

◆ $_formats

PHPExcel_Reader_HTML::$_formats
protected

Definition at line 66 of file HTML.php.

◆ $_inputEncoding

PHPExcel_Reader_HTML::$_inputEncoding = 'ANSI'
protected

Definition at line 52 of file HTML.php.

Referenced by getInputEncoding().

◆ $_nestedColumn

PHPExcel_Reader_HTML::$_nestedColumn = array('A')
protected

Definition at line 172 of file HTML.php.

◆ $_sheetIndex

PHPExcel_Reader_HTML::$_sheetIndex = 0
protected

Definition at line 59 of file HTML.php.

Referenced by getSheetIndex().

◆ $_tableLevel

PHPExcel_Reader_HTML::$_tableLevel = 0
protected

Definition at line 171 of file HTML.php.

Referenced by _getTableStartColumn(), _releaseTableStartColumn(), and _setTableStartColumn().

◆ $rowspan

PHPExcel_Reader_HTML::$rowspan = array()
protected

Definition at line 104 of file HTML.php.


The documentation for this class was generated from the following file: