ILIAS  release_5-3 Revision v5.3.23-19-g915713cf615
PHPExcel_Reader_HTML Class Reference
+ Inheritance diagram for PHPExcel_Reader_HTML:
+ Collaboration diagram for PHPExcel_Reader_HTML:

Public Member Functions

 __construct ()
 Create a new PHPExcel_Reader_HTML. More...
 
 load ($pFilename)
 Loads PHPExcel from file. More...
 
 setInputEncoding ($pValue='ANSI')
 Set input encoding. More...
 
 getInputEncoding ()
 Get input encoding. More...
 
 loadIntoExisting ($pFilename, PHPExcel $objPHPExcel)
 Loads PHPExcel from file into PHPExcel instance. More...
 
 getSheetIndex ()
 Get sheet index. More...
 
 setSheetIndex ($pValue=0)
 Set sheet index. More...
 
 securityScan ($xml)
 Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks. More...
 
- Public Member Functions inherited from PHPExcel_Reader_Abstract
 getReadDataOnly ()
 Read data only? If this is true, then the Reader will only read data values for cells, it will not read any formatting information. More...
 
 setReadDataOnly ($pValue=FALSE)
 Set read data only Set to true, to advise the Reader only to read data values for cells, and to ignore any formatting information. More...
 
 getIncludeCharts ()
 Read charts in workbook? If this is true, then the Reader will include any charts that exist in the workbook. More...
 
 setIncludeCharts ($pValue=FALSE)
 Set read charts in workbook Set to true, to advise the Reader to include any charts that exist in the workbook. More...
 
 getLoadSheetsOnly ()
 Get which sheets to load Returns either an array of worksheet names (the list of worksheets that should be loaded), or a null indicating that all worksheets in the workbook should be loaded. More...
 
 setLoadSheetsOnly ($value=NULL)
 Set which sheets to load. More...
 
 setLoadAllSheets ()
 Set all sheets to load Tells the Reader to load all worksheets from the workbook. More...
 
 getReadFilter ()
 Read filter. More...
 
 setReadFilter (PHPExcel_Reader_IReadFilter $pValue)
 Set read filter. More...
 
 canRead ($pFilename)
 Can the current PHPExcel_Reader_IReader read the file? More...
 
 securityScan ($xml)
 Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks. More...
 
 securityScanFile ($filestream)
 Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks. More...
 

Protected Member Functions

 _isValidFormat ()
 Validate that the current file is an HTML file. More...
 
 _setTableStartColumn ($column)
 
 _getTableStartColumn ()
 
 _releaseTableStartColumn ()
 
 _flushCell ($sheet, $column, $row, &$cellContent)
 
 _processDomElement (DOMNode $element, $sheet, &$row, &$column, &$cellContent, $format=null)
 
- Protected Member Functions inherited from PHPExcel_Reader_Abstract
 _openFile ($pFilename)
 Open file for reading. More...
 

Protected Attributes

 $_inputEncoding = 'ANSI'
 
 $_sheetIndex = 0
 
 $_formats
 
 $rowspan = array()
 
 $_dataArray = array()
 
 $_tableLevel = 0
 
 $_nestedColumn = array('A')
 
- Protected Attributes inherited from PHPExcel_Reader_Abstract
 $_readDataOnly = FALSE
 
 $_includeCharts = FALSE
 
 $_loadSheetsOnly = NULL
 
 $_readFilter = NULL
 
 $_fileHandle = NULL
 

Detailed Description

Definition at line 44 of file HTML.php.

Constructor & Destructor Documentation

◆ __construct()

PHPExcel_Reader_HTML::__construct ( )

Create a new PHPExcel_Reader_HTML.

Definition at line 109 of file HTML.php.

110  {
111  $this->_readFilter = new PHPExcel_Reader_DefaultReadFilter();
112  }

Member Function Documentation

◆ _flushCell()

PHPExcel_Reader_HTML::_flushCell (   $sheet,
  $column,
  $row,
$cellContent 
)
protected

Definition at line 196 of file HTML.php.

References $column, $row, and string.

Referenced by _processDomElement().

197  {
198  if (is_string($cellContent)) {
199  // Simple String content
200  if (trim($cellContent) > '') {
201  // Only actually write it if there's content in the string
202 // echo 'FLUSH CELL: ' , $column , $row , ' => ' , $cellContent , '<br />';
203  // Write to worksheet to be done here...
204  // ... we return the cell so we can mess about with styles more easily
205  $sheet->setCellValue($column . $row, $cellContent, true);
206  $this->_dataArray[$row][$column] = $cellContent;
207  }
208  } else {
209  // We have a Rich Text run
210  // TODO
211  $this->_dataArray[$row][$column] = 'RICH TEXT: ' . $cellContent;
212  }
213  $cellContent = (string) '';
214  }
Add rich text string
$column
Definition: 39dropdown.php:62
+ Here is the caller graph for this function:

◆ _getTableStartColumn()

PHPExcel_Reader_HTML::_getTableStartColumn ( )
protected

Definition at line 184 of file HTML.php.

References $_tableLevel.

Referenced by _processDomElement().

185  {
186  return $this->_nestedColumn[$this->_tableLevel];
187  }
+ Here is the caller graph for this function:

◆ _isValidFormat()

PHPExcel_Reader_HTML::_isValidFormat ( )
protected

Validate that the current file is an HTML file.

Returns
boolean

Definition at line 119 of file HTML.php.

References $data.

Referenced by loadIntoExisting().

120  {
121  // Reading 2048 bytes should be enough to validate that the format is HTML
122  $data = fread($this->_fileHandle, 2048);
123  if ((strpos($data, '<') !== FALSE) &&
124  (strlen($data) !== strlen(strip_tags($data)))) {
125  return TRUE;
126  }
127 
128  return FALSE;
129  }
+ Here is the caller graph for this function:

◆ _processDomElement()

PHPExcel_Reader_HTML::_processDomElement ( DOMNode  $element,
  $sheet,
$row,
$column,
$cellContent,
  $format = null 
)
protected

Definition at line 216 of file HTML.php.

References $column, $i, $row, _flushCell(), _getTableStartColumn(), _releaseTableStartColumn(), _setTableStartColumn(), array, and PHPExcel_Cell\extractAllCellReferencesInRange().

Referenced by loadIntoExisting().

217  {
218  foreach ($element->childNodes as $child) {
219  if ($child instanceof DOMText) {
220  $domText = preg_replace('/\s+/u', ' ', trim($child->nodeValue));
221  if (is_string($cellContent)) {
222  // simply append the text if the cell content is a plain text string
223  $cellContent .= $domText;
224  } else {
225  // but if we have a rich text run instead, we need to append it correctly
226  // TODO
227  }
228  } elseif ($child instanceof DOMElement) {
229 // echo '<b>DOM ELEMENT: </b>' , strtoupper($child->nodeName) , '<br />';
230 
231  $attributeArray = array();
232  foreach ($child->attributes as $attribute) {
233 // echo '<b>ATTRIBUTE: </b>' , $attribute->name , ' => ' , $attribute->value , '<br />';
234  $attributeArray[$attribute->name] = $attribute->value;
235  }
236 
237  switch ($child->nodeName) {
238  case 'meta' :
239  foreach ($attributeArray as $attributeName => $attributeValue) {
240  switch ($attributeName) {
241  case 'content':
242  // TODO
243  // Extract character set, so we can convert to UTF-8 if required
244  break;
245  }
246  }
247  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
248  break;
249  case 'title' :
250  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
251  $sheet->setTitle($cellContent);
252  $cellContent = '';
253  break;
254  case 'span' :
255  case 'div' :
256  case 'font' :
257  case 'i' :
258  case 'em' :
259  case 'strong':
260  case 'b' :
261 // echo 'STYLING, SPAN OR DIV<br />';
262  if ($cellContent > '')
263  $cellContent .= ' ';
264  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
265  if ($cellContent > '')
266  $cellContent .= ' ';
267 // echo 'END OF STYLING, SPAN OR DIV<br />';
268  break;
269  case 'hr' :
270  $this->_flushCell($sheet, $column, $row, $cellContent);
271  ++$row;
272  if (isset($this->_formats[$child->nodeName])) {
273  $sheet->getStyle($column . $row)->applyFromArray($this->_formats[$child->nodeName]);
274  } else {
275  $cellContent = '----------';
276  $this->_flushCell($sheet, $column, $row, $cellContent);
277  }
278  ++$row;
279  case 'br' :
280  if ($this->_tableLevel > 0) {
281  // If we're inside a table, replace with a \n
282  $cellContent .= "\n";
283  } else {
284  // Otherwise flush our existing content and move the row cursor on
285  $this->_flushCell($sheet, $column, $row, $cellContent);
286  ++$row;
287  }
288 // echo 'HARD LINE BREAK: ' , '<br />';
289  break;
290  case 'a' :
291 // echo 'START OF HYPERLINK: ' , '<br />';
292  foreach ($attributeArray as $attributeName => $attributeValue) {
293  switch ($attributeName) {
294  case 'href':
295 // echo 'Link to ' , $attributeValue , '<br />';
296  $sheet->getCell($column . $row)->getHyperlink()->setUrl($attributeValue);
297  if (isset($this->_formats[$child->nodeName])) {
298  $sheet->getStyle($column . $row)->applyFromArray($this->_formats[$child->nodeName]);
299  }
300  break;
301  }
302  }
303  $cellContent .= ' ';
304  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
305 // echo 'END OF HYPERLINK:' , '<br />';
306  break;
307  case 'h1' :
308  case 'h2' :
309  case 'h3' :
310  case 'h4' :
311  case 'h5' :
312  case 'h6' :
313  case 'ol' :
314  case 'ul' :
315  case 'p' :
316  if ($this->_tableLevel > 0) {
317  // If we're inside a table, replace with a \n
318  $cellContent .= "\n";
319 // echo 'LIST ENTRY: ' , '<br />';
320  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
321 // echo 'END OF LIST ENTRY:' , '<br />';
322  } else {
323  if ($cellContent > '') {
324  $this->_flushCell($sheet, $column, $row, $cellContent);
325  $row++;
326  }
327 // echo 'START OF PARAGRAPH: ' , '<br />';
328  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
329 // echo 'END OF PARAGRAPH:' , '<br />';
330  $this->_flushCell($sheet, $column, $row, $cellContent);
331 
332  if (isset($this->_formats[$child->nodeName])) {
333  $sheet->getStyle($column . $row)->applyFromArray($this->_formats[$child->nodeName]);
334  }
335 
336  $row++;
337  $column = 'A';
338  }
339  break;
340  case 'li' :
341  if ($this->_tableLevel > 0) {
342  // If we're inside a table, replace with a \n
343  $cellContent .= "\n";
344 // echo 'LIST ENTRY: ' , '<br />';
345  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
346 // echo 'END OF LIST ENTRY:' , '<br />';
347  } else {
348  if ($cellContent > '') {
349  $this->_flushCell($sheet, $column, $row, $cellContent);
350  }
351  ++$row;
352 // echo 'LIST ENTRY: ' , '<br />';
353  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
354 // echo 'END OF LIST ENTRY:' , '<br />';
355  $this->_flushCell($sheet, $column, $row, $cellContent);
356  $column = 'A';
357  }
358  break;
359  case 'table' :
360  $this->_flushCell($sheet, $column, $row, $cellContent);
362 // echo 'START OF TABLE LEVEL ' , $this->_tableLevel , '<br />';
363  if ($this->_tableLevel > 1)
364  --$row;
365  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
366 // echo 'END OF TABLE LEVEL ' , $this->_tableLevel , '<br />';
368  if ($this->_tableLevel > 1) {
369  ++$column;
370  } else {
371  ++$row;
372  }
373  break;
374  case 'thead' :
375  case 'tbody' :
376  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
377  break;
378  case 'tr' :
379  $column = $this->_getTableStartColumn();
380  $cellContent = '';
381 // echo 'START OF TABLE ' , $this->_tableLevel , ' ROW<br />';
382  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
383  ++$row;
384 // echo 'END OF TABLE ' , $this->_tableLevel , ' ROW<br />';
385  break;
386  case 'th' :
387  case 'td' :
388 // echo 'START OF TABLE ' , $this->_tableLevel , ' CELL<br />';
389  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
390 // echo 'END OF TABLE ' , $this->_tableLevel , ' CELL<br />';
391 
392  while (isset($this->rowspan[$column . $row])) {
393  ++$column;
394  }
395 
396  $this->_flushCell($sheet, $column, $row, $cellContent);
397 
398 // if (isset($attributeArray['style']) && !empty($attributeArray['style'])) {
399 // $styleAry = $this->getPhpExcelStyleArray($attributeArray['style']);
400 //
401 // if (!empty($styleAry)) {
402 // $sheet->getStyle($column . $row)->applyFromArray($styleAry);
403 // }
404 // }
405 
406  if (isset($attributeArray['rowspan']) && isset($attributeArray['colspan'])) {
407  //create merging rowspan and colspan
408  $columnTo = $column;
409  for ($i = 0; $i < $attributeArray['colspan'] - 1; $i++) {
410  ++$columnTo;
411  }
412  $range = $column . $row . ':' . $columnTo . ($row + $attributeArray['rowspan'] - 1);
413  foreach (\PHPExcel_Cell::extractAllCellReferencesInRange($range) as $value) {
414  $this->rowspan[$value] = true;
415  }
416  $sheet->mergeCells($range);
417  $column = $columnTo;
418  } elseif (isset($attributeArray['rowspan'])) {
419  //create merging rowspan
420  $range = $column . $row . ':' . $column . ($row + $attributeArray['rowspan'] - 1);
421  foreach (\PHPExcel_Cell::extractAllCellReferencesInRange($range) as $value) {
422  $this->rowspan[$value] = true;
423  }
424  $sheet->mergeCells($range);
425  } elseif (isset($attributeArray['colspan'])) {
426  //create merging colspan
427  $columnTo = $column;
428  for ($i = 0; $i < $attributeArray['colspan'] - 1; $i++) {
429  ++$columnTo;
430  }
431  $sheet->mergeCells($column . $row . ':' . $columnTo . $row);
432  $column = $columnTo;
433  }
434  ++$column;
435  break;
436  case 'body' :
437  $row = 1;
438  $column = 'A';
439  $content = '';
440  $this->_tableLevel = 0;
441  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
442  break;
443  default:
444  $this->_processDomElement($child, $sheet, $row, $column, $cellContent);
445  }
446  }
447  }
448  }
_releaseTableStartColumn()
Definition: HTML.php:189
_flushCell($sheet, $column, $row, &$cellContent)
Definition: HTML.php:196
$column
Definition: 39dropdown.php:62
Create styles array
The data for the language used.
static extractAllCellReferencesInRange($pRange='A1')
Extract all cell references in range.
Definition: Cell.php:854
$i
Definition: disco.tpl.php:19
_processDomElement(DOMNode $element, $sheet, &$row, &$column, &$cellContent, $format=null)
Definition: HTML.php:216
_setTableStartColumn($column)
Definition: HTML.php:174
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _releaseTableStartColumn()

PHPExcel_Reader_HTML::_releaseTableStartColumn ( )
protected

Definition at line 189 of file HTML.php.

References $_tableLevel.

Referenced by _processDomElement().

190  {
192 
193  return array_pop($this->_nestedColumn);
194  }
+ Here is the caller graph for this function:

◆ _setTableStartColumn()

PHPExcel_Reader_HTML::_setTableStartColumn (   $column)
protected

Definition at line 174 of file HTML.php.

References $_tableLevel, and $column.

Referenced by _processDomElement().

175  {
176  if ($this->_tableLevel == 0)
177  $column = 'A';
179  $this->_nestedColumn[$this->_tableLevel] = $column;
180 
181  return $this->_nestedColumn[$this->_tableLevel];
182  }
$column
Definition: 39dropdown.php:62
+ Here is the caller graph for this function:

◆ getInputEncoding()

PHPExcel_Reader_HTML::getInputEncoding ( )

Get input encoding.

Returns
string

Definition at line 164 of file HTML.php.

References $_inputEncoding.

165  {
166  return $this->_inputEncoding;
167  }

◆ getSheetIndex()

PHPExcel_Reader_HTML::getSheetIndex ( )

Get sheet index.

Returns
int

Definition at line 500 of file HTML.php.

References $_sheetIndex.

501  {
502  return $this->_sheetIndex;
503  }

◆ load()

PHPExcel_Reader_HTML::load (   $pFilename)

Loads PHPExcel from file.

Parameters
string$pFilename
Returns
PHPExcel
Exceptions
PHPExcel_Reader_Exception

Implements PHPExcel_Reader_IReader.

Definition at line 138 of file HTML.php.

References $objPHPExcel, and loadIntoExisting().

139  {
140  // Create new PHPExcel
141  $objPHPExcel = new PHPExcel();
142 
143  // Load into this instance
144  return $this->loadIntoExisting($pFilename, $objPHPExcel);
145  }
$objPHPExcel
loadIntoExisting($pFilename, PHPExcel $objPHPExcel)
Loads PHPExcel from file into PHPExcel instance.
Definition: HTML.php:458
+ Here is the call graph for this function:

◆ loadIntoExisting()

PHPExcel_Reader_HTML::loadIntoExisting (   $pFilename,
PHPExcel  $objPHPExcel 
)

Loads PHPExcel from file into PHPExcel instance.

Parameters
string$pFilename
PHPExcel$objPHPExcel
Returns
PHPExcel
Exceptions
PHPExcel_Reader_Exception

Definition at line 458 of file HTML.php.

References $_sheetIndex, $column, $objPHPExcel, $row, _isValidFormat(), PHPExcel_Reader_Abstract\_openFile(), _processDomElement(), PHPExcel\createSheet(), PHPExcel\getActiveSheet(), PHPExcel\getSheetCount(), PHPExcel_Reader_Abstract\securityScanFile(), and PHPExcel\setActiveSheetIndex().

Referenced by load().

459  {
460  // Open file to validate
461  $this->_openFile($pFilename);
462  if (!$this->_isValidFormat()) {
463  fclose($this->_fileHandle);
464  throw new PHPExcel_Reader_Exception($pFilename . " is an Invalid HTML file.");
465  }
466  // Close after validating
467  fclose($this->_fileHandle);
468 
469  // Create new PHPExcel
470  while ($objPHPExcel->getSheetCount() <= $this->_sheetIndex) {
471  $objPHPExcel->createSheet();
472  }
473  $objPHPExcel->setActiveSheetIndex($this->_sheetIndex);
474 
475  // Create a new DOM object
476  $dom = new domDocument;
477  // Reload the HTML file into the DOM object
478  $loaded = $dom->loadHTML($this->securityScanFile($pFilename));
479  if ($loaded === FALSE) {
480  throw new PHPExcel_Reader_Exception('Failed to load ', $pFilename, ' as a DOM Document');
481  }
482 
483  // Discard white space
484  $dom->preserveWhiteSpace = false;
485 
486  $row = 0;
487  $column = 'A';
488  $content = '';
489  $this->_processDomElement($dom, $objPHPExcel->getActiveSheet(), $row, $column, $content);
490 
491  // Return
492  return $objPHPExcel;
493  }
getSheetCount()
Get sheet count.
Definition: PHPExcel.php:661
$objPHPExcel
_isValidFormat()
Validate that the current file is an HTML file.
Definition: HTML.php:119
createSheet($iSheetIndex=NULL)
Create sheet and add it to this workbook.
Definition: PHPExcel.php:479
securityScanFile($filestream)
Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks.
Definition: Abstract.php:251
$column
Definition: 39dropdown.php:62
getActiveSheet()
Get active sheet.
Definition: PHPExcel.php:467
setActiveSheetIndex($pIndex=0)
Set active sheet index.
Definition: PHPExcel.php:683
_processDomElement(DOMNode $element, $sheet, &$row, &$column, &$cellContent, $format=null)
Definition: HTML.php:216
_openFile($pFilename)
Open file for reading.
Definition: Abstract.php:195
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ securityScan()

PHPExcel_Reader_HTML::securityScan (   $xml)

Scan theXML for use of <!ENTITY to prevent XXE/XEE attacks.

Parameters
string$xml
Exceptions
PHPExcel_Reader_Exception

Definition at line 524 of file HTML.php.

References $xml.

525  {
526  $pattern = '/\\0?' . implode('\\0?', str_split('<!ENTITY')) . '\\0?/';
527  if (preg_match($pattern, $xml)) {
528  throw new PHPExcel_Reader_Exception('Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
529  }
530  return $xml;
531  }
$xml
Definition: metadata.php:240

◆ setInputEncoding()

PHPExcel_Reader_HTML::setInputEncoding (   $pValue = 'ANSI')

Set input encoding.

Parameters
string$pValueInput encoding

Definition at line 152 of file HTML.php.

153  {
154  $this->_inputEncoding = $pValue;
155 
156  return $this;
157  }

◆ setSheetIndex()

PHPExcel_Reader_HTML::setSheetIndex (   $pValue = 0)

Set sheet index.

Parameters
int$pValueSheet index
Returns
PHPExcel_Reader_HTML

Definition at line 511 of file HTML.php.

512  {
513  $this->_sheetIndex = $pValue;
514 
515  return $this;
516  }

Field Documentation

◆ $_dataArray

PHPExcel_Reader_HTML::$_dataArray = array()
protected

Definition at line 170 of file HTML.php.

◆ $_formats

PHPExcel_Reader_HTML::$_formats
protected

Definition at line 66 of file HTML.php.

◆ $_inputEncoding

PHPExcel_Reader_HTML::$_inputEncoding = 'ANSI'
protected

Definition at line 52 of file HTML.php.

Referenced by getInputEncoding().

◆ $_nestedColumn

PHPExcel_Reader_HTML::$_nestedColumn = array('A')
protected

Definition at line 172 of file HTML.php.

◆ $_sheetIndex

PHPExcel_Reader_HTML::$_sheetIndex = 0
protected

Definition at line 59 of file HTML.php.

Referenced by getSheetIndex(), and loadIntoExisting().

◆ $_tableLevel

PHPExcel_Reader_HTML::$_tableLevel = 0
protected

Definition at line 171 of file HTML.php.

Referenced by _getTableStartColumn(), _releaseTableStartColumn(), and _setTableStartColumn().

◆ $rowspan

PHPExcel_Reader_HTML::$rowspan = array()
protected

Definition at line 104 of file HTML.php.


The documentation for this class was generated from the following file: