ILIAS  release_5-1 Revision 5.0.0-5477-g43f3e3fab5f
Auth_OpenID_Parse Class Reference

This module implements a VERY limited parser that finds <link> tags in the head of HTML or XHTML documents and parses out their attributes according to the OpenID spec. More...

+ Collaboration diagram for Auth_OpenID_Parse:

Public Member Functions

 Auth_OpenID_Parse ()
 
 tagMatcher ($tag_name, $close_tags=null)
 Returns a regular expression that will match a given tag in an SGML string. More...
 
 openTag ($tag_name)
 
 closeTag ($tag_name)
 
 htmlBegin ($s)
 
 htmlEnd ($s)
 
 headFind ()
 
 replaceEntities ($str)
 
 removeQuotes ($str)
 
 match ($regexp, $text, &$match)
 
 parseLinkAttrs ($html)
 Find all link tags in a string representing a HTML document and return a list of their attributes. More...
 
 relMatches ($rel_attr, $target_rel)
 
 linkHasRel ($link_attrs, $target_rel)
 
 findLinksRel ($link_attrs_list, $target_rel)
 
 findFirstHref ($link_attrs_list, $target_rel)
 

Data Fields

 $_re_flags = "si"
 Specify some flags for use with regex matching. More...
 
 $_removed_re
 Stuff to remove before we start looking for tags. More...
 
 $_tag_expr = "<%s\b(?!:)([^>]*?)(?:\/>|>(.*)(?:<\/?%s\s*>|\Z))"
 Starts with the tag name at a word boundary, where the tag name is not a namespace. More...
 
 $_attr_find = '\b(\w+)=("[^"]*"|\'[^\']*\'|[^\'"\s\/<>]+)'
 
 $_open_tag_expr = "<%s\b"
 
 $_close_tag_expr = "<((\/%s\b)|(%s[^>\/]*\/))>"
 

Detailed Description

This module implements a VERY limited parser that finds <link> tags in the head of HTML or XHTML documents and parses out their attributes according to the OpenID spec.

It is a liberal parser, but it requires these things from the data in order to work:

  • There must be an open <html> tag
  • There must be an open <head> tag inside of the <html> tag
  • Only <link>s that are found inside of the <head> tag are parsed (this is by design)
  • The parser follows the OpenID specification in resolving the attributes of the link tags. This means that the attributes DO NOT get resolved as they would by an XML or HTML parser. In particular, only certain entities get replaced, and href attributes do not get resolved relative to a base URL.

From http://openid.net/specs.bml:

  • The openid.server URL MUST be an absolute URL. OpenID consumers MUST NOT attempt to resolve relative URLs.
  • The openid.server URL MUST NOT include entities other than &, <, >, and ".

The parser ignores SGML comments and blocks. Both kinds of quoting are allowed for attributes.

The parser deals with invalid markup in these ways:

  • Tag names are not case-sensitive
  • The <html> tag is accepted even when it is not at the top level
  • The <head> tag is accepted even when it is not a direct child of the <html> tag, but a <html> tag must be an ancestor of the <head> tag
  • <link> tags are accepted even when they are not direct children of the <head> tag, but a <head> tag must be an ancestor of the <link> tag
  • If there is no closing tag for an open <html> or <head> tag, the remainder of the document is viewed as being inside of the tag. If there is no closing tag for a <link> tag, the link tag is treated as a short tag. Exceptions to this rule are that <html> closes <html> and <body> or <head> closes <head>
  • Attributes of the <link> tag are not required to be quoted.
  • In the case of duplicated attribute names, the attribute coming last in the tag will be the value returned.
  • Any text that does not parse as an attribute within a link tag will be ignored. (e.g. <link pumpkin rel='openid.server' > will ignore pumpkin)
  • If there are more than one <html> or <head> tag, the parser only looks inside of the first one.
  • The contents of <script> tags are ignored entirely, except unclosed <script> tags. Unclosed <script> tags are ignored.
  • Any other invalid markup is ignored, including unclosed SGML comments and unclosed blocks.

PHP versions 4 and 5

LICENSE: See the COPYING file included in this distribution.

@access private @package OpenID

Author
JanRain, Inc. <openi.nosp@m.d@ja.nosp@m.nrain.nosp@m..com>

Definition at line 87 of file Parse.php.

Member Function Documentation

◆ Auth_OpenID_Parse()

Auth_OpenID_Parse::Auth_OpenID_Parse ( )

Definition at line 111 of file Parse.php.

112 {
113 $this->_link_find = sprintf("/<link\b(?!:)([^>]*)(?!<)>/%s",
114 $this->_re_flags);
115
116 $this->_entity_replacements = array(
117 'amp' => '&',
118 'lt' => '<',
119 'gt' => '>',
120 'quot' => '"'
121 );
122
123 $this->_attr_find = sprintf("/%s/%s",
124 $this->_attr_find,
125 $this->_re_flags);
126
127 $this->_removed_re = sprintf("/%s/%s",
128 $this->_removed_re,
129 $this->_re_flags);
130
131 $this->_ent_replace =
132 sprintf("&(%s);", implode("|",
133 $this->_entity_replacements));
134 }

◆ closeTag()

Auth_OpenID_Parse::closeTag (   $tag_name)

Definition at line 161 of file Parse.php.

162 {
163 $expr = sprintf($this->_close_tag_expr, $tag_name, $tag_name);
164 return sprintf("/%s/%s", $expr, $this->_re_flags);
165 }

Referenced by htmlEnd().

+ Here is the caller graph for this function:

◆ findFirstHref()

Auth_OpenID_Parse::findFirstHref (   $link_attrs_list,
  $target_rel 
)

Definition at line 344 of file Parse.php.

345 {
346 // Return the value of the href attribute for the first link
347 // tag in the list that has target_rel as a relationship.
348 // XXX: TESTME
349 $matches = $this->findLinksRel($link_attrs_list,
350 $target_rel);
351 if (!$matches) {
352 return null;
353 }
354 $first = $matches[0];
355 return Auth_OpenID::arrayGet($first, 'href', null);
356 }
findLinksRel($link_attrs_list, $target_rel)
Definition: Parse.php:329
static arrayGet($arr, $key, $fallback=null)
Convenience function for getting array values.
Definition: OpenID.php:242

References Auth_OpenID\arrayGet(), and findLinksRel().

+ Here is the call graph for this function:

◆ findLinksRel()

Auth_OpenID_Parse::findLinksRel (   $link_attrs_list,
  $target_rel 
)

Definition at line 329 of file Parse.php.

330 {
331 // Filter the list of link attributes on whether it has
332 // target_rel as a relationship.
333 // XXX: TESTME
334 $result = array();
335 foreach ($link_attrs_list as $attr) {
336 if ($this->linkHasRel($attr, $target_rel)) {
337 $result[] = $attr;
338 }
339 }
340
341 return $result;
342 }
$result
linkHasRel($link_attrs, $target_rel)
Definition: Parse.php:320

References $result, and linkHasRel().

Referenced by findFirstHref().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ headFind()

Auth_OpenID_Parse::headFind ( )

Definition at line 191 of file Parse.php.

192 {
193 return $this->tagMatcher('head', array('body', 'html'));
194 }
tagMatcher($tag_name, $close_tags=null)
Returns a regular expression that will match a given tag in an SGML string.
Definition: Parse.php:140

References tagMatcher().

Referenced by parseLinkAttrs().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ htmlBegin()

Auth_OpenID_Parse::htmlBegin (   $s)

Definition at line 167 of file Parse.php.

168 {
169 $matches = array();
170 $result = preg_match($this->openTag('html'), $s,
171 $matches, PREG_OFFSET_CAPTURE);
172 if ($result === false || !$matches) {
173 return false;
174 }
175 // Return the offset of the first match.
176 return $matches[0][1];
177 }
openTag($tag_name)
Definition: Parse.php:155

References $result, and openTag().

Referenced by parseLinkAttrs().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ htmlEnd()

Auth_OpenID_Parse::htmlEnd (   $s)

Definition at line 179 of file Parse.php.

180 {
181 $matches = array();
182 $result = preg_match($this->closeTag('html'), $s,
183 $matches, PREG_OFFSET_CAPTURE);
184 if ($result === false || !$matches) {
185 return false;
186 }
187 // Return the offset of the first match.
188 return $matches[count($matches) - 1][1];
189 }
closeTag($tag_name)
Definition: Parse.php:161

References $result, and closeTag().

Referenced by parseLinkAttrs().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ linkHasRel()

Auth_OpenID_Parse::linkHasRel (   $link_attrs,
  $target_rel 
)

Definition at line 320 of file Parse.php.

321 {
322 // Does this link have target_rel as a relationship?
323 // XXX: TESTME
324 $rel_attr = Auth_OpeniD::arrayGet($link_attrs, 'rel', null);
325 return ($rel_attr && $this->relMatches($rel_attr,
326 $target_rel));
327 }
relMatches($rel_attr, $target_rel)
Definition: Parse.php:305

References relMatches().

Referenced by findLinksRel().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ match()

Auth_OpenID_Parse::match (   $regexp,
  $text,
$match 
)

Definition at line 219 of file Parse.php.

220 {
221 if (!is_callable('mb_ereg_search_init')) {
222 return preg_match($regexp, $text, $match);
223 }
224
225 $regexp = substr($regexp, 1, strlen($regexp) - 2 - strlen($this->_re_flags));
226 mb_ereg_search_init($text);
227 if (!mb_ereg_search($regexp)) {
228 return false;
229 }
230 list($match) = mb_ereg_search_getregs();
231 return true;
232 }
$text

References $text.

Referenced by parseLinkAttrs().

+ Here is the caller graph for this function:

◆ openTag()

Auth_OpenID_Parse::openTag (   $tag_name)

Definition at line 155 of file Parse.php.

156 {
157 $expr = sprintf($this->_open_tag_expr, $tag_name);
158 return sprintf("/%s/%s", $expr, $this->_re_flags);
159 }

Referenced by htmlBegin().

+ Here is the caller graph for this function:

◆ parseLinkAttrs()

Auth_OpenID_Parse::parseLinkAttrs (   $html)

Find all link tags in a string representing a HTML document and return a list of their attributes.

Todo:
This is quite ineffective and may fail with the default pcre.backtrack_limit of 100000 in PHP 5.2, if $html is big. It should rather use stripos (in PHP5) or strpos()+strtoupper() in PHP4 to manage this.
Parameters
string$htmlThe text to parse
Returns
array $list An array of arrays of attributes, one for each link tag

Definition at line 247 of file Parse.php.

248 {
249 $stripped = preg_replace($this->_removed_re,
250 "",
251 $html);
252
253 $html_begin = $this->htmlBegin($stripped);
254 $html_end = $this->htmlEnd($stripped);
255
256 if ($html_begin === false) {
257 return array();
258 }
259
260 if ($html_end === false) {
261 $html_end = strlen($stripped);
262 }
263
264 $stripped = substr($stripped, $html_begin,
265 $html_end - $html_begin);
266
267 // Workaround to prevent PREG_BACKTRACK_LIMIT_ERROR:
268 $old_btlimit = ini_set( 'pcre.backtrack_limit', -1 );
269
270 // Try to find the <HEAD> tag.
271 $head_re = $this->headFind();
272 $head_match = '';
273 if (!$this->match($head_re, $stripped, $head_match)) {
274 ini_set( 'pcre.backtrack_limit', $old_btlimit );
275 return array();
276 }
277
278 $link_data = array();
279 $link_matches = array();
280
281 if (!preg_match_all($this->_link_find, $head_match,
282 $link_matches)) {
283 ini_set( 'pcre.backtrack_limit', $old_btlimit );
284 return array();
285 }
286
287 foreach ($link_matches[0] as $link) {
288 $attr_matches = array();
289 preg_match_all($this->_attr_find, $link, $attr_matches);
290 $link_attrs = array();
291 foreach ($attr_matches[0] as $index => $full_match) {
292 $name = $attr_matches[1][$index];
293 $value = $this->replaceEntities(
294 $this->removeQuotes($attr_matches[2][$index]));
295
296 $link_attrs[strtolower($name)] = $value;
297 }
298 $link_data[] = $link_attrs;
299 }
300
301 ini_set( 'pcre.backtrack_limit', $old_btlimit );
302 return $link_data;
303 }
removeQuotes($str)
Definition: Parse.php:204
replaceEntities($str)
Definition: Parse.php:196
match($regexp, $text, &$match)
Definition: Parse.php:219
$html
Definition: example_001.php:87

References $html, headFind(), htmlBegin(), htmlEnd(), match(), removeQuotes(), and replaceEntities().

+ Here is the call graph for this function:

◆ relMatches()

Auth_OpenID_Parse::relMatches (   $rel_attr,
  $target_rel 
)

Definition at line 305 of file Parse.php.

306 {
307 // Does this target_rel appear in the rel_str?
308 // XXX: TESTME
309 $rels = preg_split("/\s+/", trim($rel_attr));
310 foreach ($rels as $rel) {
311 $rel = strtolower($rel);
312 if ($rel == $target_rel) {
313 return 1;
314 }
315 }
316
317 return 0;
318 }

Referenced by linkHasRel().

+ Here is the caller graph for this function:

◆ removeQuotes()

Auth_OpenID_Parse::removeQuotes (   $str)

Definition at line 204 of file Parse.php.

205 {
206 $matches = array();
207 $double = '/^"(.*)"$/';
208 $single = "/^\'(.*)\'$/";
209
210 if (preg_match($double, $str, $matches)) {
211 return $matches[1];
212 } else if (preg_match($single, $str, $matches)) {
213 return $matches[1];
214 } else {
215 return $str;
216 }
217 }

Referenced by parseLinkAttrs().

+ Here is the caller graph for this function:

◆ replaceEntities()

Auth_OpenID_Parse::replaceEntities (   $str)

Definition at line 196 of file Parse.php.

197 {
198 foreach ($this->_entity_replacements as $old => $new) {
199 $str = preg_replace(sprintf("/&%s;/", $old), $new, $str);
200 }
201 return $str;
202 }

Referenced by parseLinkAttrs().

+ Here is the caller graph for this function:

◆ tagMatcher()

Auth_OpenID_Parse::tagMatcher (   $tag_name,
  $close_tags = null 
)

Returns a regular expression that will match a given tag in an SGML string.

Definition at line 140 of file Parse.php.

141 {
142 $expr = $this->_tag_expr;
143
144 if ($close_tags) {
145 $options = implode("|", array_merge(array($tag_name), $close_tags));
146 $closer = sprintf("(?:%s)", $options);
147 } else {
148 $closer = $tag_name;
149 }
150
151 $expr = sprintf($expr, $tag_name, $closer);
152 return sprintf("/%s/%s", $expr, $this->_re_flags);
153 }
$_tag_expr
Starts with the tag name at a word boundary, where the tag name is not a namespace.
Definition: Parse.php:104
if(!is_array($argv)) $options

References $_tag_expr, and $options.

Referenced by headFind().

+ Here is the caller graph for this function:

Field Documentation

◆ $_attr_find

Auth_OpenID_Parse::$_attr_find = '\b(\w+)=("[^"]*"|\'[^\']*\'|[^\'"\s\/<>]+)'

Definition at line 106 of file Parse.php.

◆ $_close_tag_expr

Auth_OpenID_Parse::$_close_tag_expr = "<((\/%s\b)|(%s[^>\/]*\/))>"

Definition at line 109 of file Parse.php.

◆ $_open_tag_expr

Auth_OpenID_Parse::$_open_tag_expr = "<%s\b"

Definition at line 108 of file Parse.php.

◆ $_re_flags

Auth_OpenID_Parse::$_re_flags = "si"

Specify some flags for use with regex matching.

Definition at line 92 of file Parse.php.

◆ $_removed_re

Auth_OpenID_Parse::$_removed_re
Initial value:
=
"<!--.*?-->|<!\[CDATA\[.*?\]\]>|<script\b(?!:)[^>]*>.*?<\/script>"

Stuff to remove before we start looking for tags.

Definition at line 97 of file Parse.php.

◆ $_tag_expr

Auth_OpenID_Parse::$_tag_expr = "<%s\b(?!:)([^>]*?)(?:\/>|>(.*)(?:<\/?%s\s*>|\Z))"

Starts with the tag name at a word boundary, where the tag name is not a namespace.

Definition at line 104 of file Parse.php.

Referenced by tagMatcher().


The documentation for this class was generated from the following file: