parent
2bae9db309
commit
e9adbbd345
@ -0,0 +1,62 @@
|
|||||||
|
= 0.6
|
||||||
|
=== 15th June, 2007
|
||||||
|
* Hpricot for JRuby -- nice work Ola Bini!
|
||||||
|
* Inline Markaby for Hpricot documents.
|
||||||
|
* XML tags and attributes are no longer downcased like HTML is.
|
||||||
|
* new syntax for grabbing everything between two elements using a Range in the search method: (doc/("font".."font/br")) or in nodes_at like so: (doc/"font").nodes_at("*".."br"). Only works with either a pair of siblings or a set of a parent and a sibling.
|
||||||
|
* Ignore self-closing endings on tags (such as form) which are containers. Treat them like open parent tags. Reported by Jonathan Nichols on the hpricot list.
|
||||||
|
* Escaping of attributes, yanked from Jim Weirich and Sam Ruby's work in Builder.
|
||||||
|
* Element#raw_attributes gives unescaped data. Element#attributes gives escaped.
|
||||||
|
* Added: Elements#attr, Elements#remove_attr, Elements#remove_class.
|
||||||
|
* Added: Traverse#preceding, Traverse#following, Traverse#previous, Traverse#next.
|
||||||
|
|
||||||
|
= 0.5
|
||||||
|
=== 31rd January, 2007
|
||||||
|
|
||||||
|
* support for a[text()="Click Me!"] and h3[text()*="space"] and the like.
|
||||||
|
* Hpricot.buffer_size accessor for increasing Hpricot's buffer if you're encountering huge ASP.NET viewstate attribs.
|
||||||
|
* some support for colons in tag names (not full namespace support yet.)
|
||||||
|
* Element.to_original_html will attempt to preserve the original HTML while merging your changes.
|
||||||
|
* Element.to_plain_text converts an element's contents to a simple text format.
|
||||||
|
* Element.inner_text removes all tags and returns text nodes concatenated into a single string.
|
||||||
|
* no @raw_string variable kept for comments, text, and cdata -- as it's redundant.
|
||||||
|
* xpath-style indices (//p/a[1]) but keep in mind that they aren't zero-based.
|
||||||
|
* node_position is the index among all sibling nodes, while position is the position among children of identical type.
|
||||||
|
* comment() and text() search criteria, like: //p/text(), which selects all text inside paragraph tags.
|
||||||
|
* every element has css_path and xpath methods which return respective absolute paths.
|
||||||
|
* more flexibility all around: in parsing attributes, tags, comments and cdata.
|
||||||
|
|
||||||
|
= 0.4
|
||||||
|
=== 11th August, 2006
|
||||||
|
|
||||||
|
* The :fixup_tags option will try to sort out the hierarchy so elements end up with the right parents.
|
||||||
|
* Elements such as *script* and *style* (identified as having CDATA contents) receive a single text node as their children now. Previously, Hpricot was parsing out tags found in scripts.
|
||||||
|
* Better scanning of partially quoted attributes (found by Brent Beardsly on http://uswebgen.com/)
|
||||||
|
* Better scanning of unquoted attributes -- thanks to Aaron Patterson for the test cases!
|
||||||
|
* Some tags were being output in the empty tag style, although browsers hated that. FIXED!
|
||||||
|
* Added Elements#at for finding single elements.
|
||||||
|
* Added Elem::Trav#[] and Elem::Trav#[]= for reading and writing attributes.
|
||||||
|
|
||||||
|
= 0.3
|
||||||
|
=== 7th July, 2006
|
||||||
|
|
||||||
|
* Fixed negative string size error on empty tokens. (news.bbc.co.uk)
|
||||||
|
* Allow the parser to accept just text nodes. (such as: <tt>Hpricot.parse('TEXT')</tt>)
|
||||||
|
* from JQuery to Hpricot::Elements: remove, empty, append, prepend, before, after, wrap, set,
|
||||||
|
html(...), to_html, to_s.
|
||||||
|
* on containers: to_html, replace_child, insert_before, insert_after, innerHTML=.
|
||||||
|
* Hpricot(...) is an alias for parse.
|
||||||
|
* open up all properties to setters, let people do as they may.
|
||||||
|
* use to_html for the full html of a node or set of elements.
|
||||||
|
* doctypes were messed.
|
||||||
|
|
||||||
|
= 0.2
|
||||||
|
=== 4th July, 2006
|
||||||
|
|
||||||
|
* Rewrote the HTree parser to be simpler, more adequate for the common man. Will add encoding back in later.
|
||||||
|
|
||||||
|
= 0.1
|
||||||
|
=== 3rd July, 2006
|
||||||
|
|
||||||
|
* For whatever reason, wrote this HTML parser in C.
|
||||||
|
I guess Ragel is addictive and I want to improve HTree.
|
@ -0,0 +1,18 @@
|
|||||||
|
Copyright (c) 2006 why the lucky stiff
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to
|
||||||
|
deal in the Software without restriction, including without limitation the
|
||||||
|
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||||
|
sell copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in
|
||||||
|
all copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||||
|
THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
||||||
|
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
||||||
|
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
@ -0,0 +1,284 @@
|
|||||||
|
= Hpricot, Read Any HTML
|
||||||
|
|
||||||
|
Hpricot is a fast, flexible HTML parser written in C. It's designed to be very
|
||||||
|
accommodating (like Tanaka Akira's HTree) and to have a very helpful library
|
||||||
|
(like some JavaScript libs -- JQuery, Prototype -- give you.) The XPath and CSS
|
||||||
|
parser, in fact, is based on John Resig's JQuery.
|
||||||
|
|
||||||
|
Also, Hpricot can be handy for reading broken XML files, since many of the same
|
||||||
|
techniques can be used. If a quote is missing, Hpricot tries to figure it out.
|
||||||
|
If tags overlap, Hpricot works on sorting them out. You know, that sort of
|
||||||
|
thing.
|
||||||
|
|
||||||
|
*Please read this entire document* before making assumptions about how this
|
||||||
|
software works.
|
||||||
|
|
||||||
|
== An Overview
|
||||||
|
|
||||||
|
Let's clear up what Hpricot is.
|
||||||
|
|
||||||
|
# Hpricot is *a standalone library*. It requires no other libraries. Just Ruby!
|
||||||
|
# While priding itself on speed, Hpricot *works hard to sort out bad HTML* and
|
||||||
|
pays a small penalty in order to get that right. So that's slightly more important
|
||||||
|
to me than speed.
|
||||||
|
# *If you can see it in Firefox, then Hpricot should parse it.* That's
|
||||||
|
how it should be! Let me know the minute it's otherwise.
|
||||||
|
# Primarily, Hpricot is used for reading HTML and tries to sort out troubled
|
||||||
|
HTML by having some idea of what good HTML is. Some people still like to use
|
||||||
|
Hpricot for XML reading, but *remember to use the Hpricot::XML() method* for that!
|
||||||
|
|
||||||
|
== The Hpricot Kingdom
|
||||||
|
|
||||||
|
First, here are all the links you need to know:
|
||||||
|
|
||||||
|
* http://code.whytheluckystiff.net/hpricot is the Hpricot wiki and bug tracker.
|
||||||
|
Go there for news and recipes and patches. It's the center of activity.
|
||||||
|
* http://code.whytheluckystiff.net/svn/hpricot/trunk is the main Subversion
|
||||||
|
repository for Hpricot. You can get the latest code there.
|
||||||
|
* http://code.whytheluckystiff.net/doc/hpricot is the home for the latest copy of
|
||||||
|
this reference.
|
||||||
|
* See COPYING for the terms of this software. (Spoiler: it's absolutely free.)
|
||||||
|
|
||||||
|
If you have any trouble, don't hesitate to contact the author. As always, I'm
|
||||||
|
not going to say "Use at your own risk" because I don't want this library to be
|
||||||
|
risky. If you trip on something, I'll share the liability by repairing things
|
||||||
|
as quickly as I can. Your responsibility is to report the inadequacies.
|
||||||
|
|
||||||
|
== Installing Hpricot
|
||||||
|
|
||||||
|
You may get the latest stable version from Rubyforge. Win32 binaries and source
|
||||||
|
gems are available.
|
||||||
|
|
||||||
|
$ gem install hpricot
|
||||||
|
|
||||||
|
As Hpricot is still under active development, you can also try the most recent
|
||||||
|
candidate build here:
|
||||||
|
|
||||||
|
$ gem install hpricot --source http://code.whytheluckystiff.net
|
||||||
|
|
||||||
|
The development gem is usually in pretty good shape actually. You can also
|
||||||
|
get the bleeding edge code or plain Ruby tarballs on the wiki.
|
||||||
|
|
||||||
|
== An Hpricot Showcase
|
||||||
|
|
||||||
|
We're going to run through a big pile of examples to get you jump-started.
|
||||||
|
Many of these examples are also found at
|
||||||
|
http://code.whytheluckystiff.net/hpricot/wiki/HpricotBasics, in case you
|
||||||
|
want to add some of your own.
|
||||||
|
|
||||||
|
=== Loading Hpricot Itself
|
||||||
|
|
||||||
|
You have probably got the gem, right? To load Hpricot:
|
||||||
|
|
||||||
|
require 'rubygems'
|
||||||
|
require 'hpricot'
|
||||||
|
|
||||||
|
If you've installed the plain source distribution, go ahead and just:
|
||||||
|
|
||||||
|
require 'hpricot'
|
||||||
|
|
||||||
|
=== Load an HTML Page
|
||||||
|
|
||||||
|
The <tt>Hpricot()</tt> method takes a string or any IO object and loads the
|
||||||
|
contents into a document object.
|
||||||
|
|
||||||
|
doc = Hpricot("<p>A simple <b>test</b> string.</p>")
|
||||||
|
|
||||||
|
To load from a file, just get the stream open:
|
||||||
|
|
||||||
|
doc = open("index.html") { |f| Hpricot(f) }
|
||||||
|
|
||||||
|
To load from a web URL, use <tt>open-uri</tt>, which comes with Ruby:
|
||||||
|
|
||||||
|
require 'open-uri'
|
||||||
|
doc = open("http://qwantz.com/") { |f| Hpricot(f) }
|
||||||
|
|
||||||
|
Hpricot uses an internal buffer to parse the file, so the IO will stream
|
||||||
|
properly and large documents won't be loaded into memory all at once. However,
|
||||||
|
the parsed document object will be present in memory, in its entirety.
|
||||||
|
|
||||||
|
=== Search for Elements
|
||||||
|
|
||||||
|
Use <tt>Doc.search</tt>:
|
||||||
|
|
||||||
|
doc.search("//p[@class='posted']")
|
||||||
|
#=> #<Hpricot:Elements[{p ...}, {p ...}]>
|
||||||
|
|
||||||
|
<tt>Doc.search</tt> can take an XPath or CSS expression. In the above example,
|
||||||
|
all paragraph <tt><p></tt> elements are grabbed which have a <tt>class</tt>
|
||||||
|
attribute of <tt>"posted"</tt>.
|
||||||
|
|
||||||
|
A shortcut is to use the divisor:
|
||||||
|
|
||||||
|
(doc/"p.posted")
|
||||||
|
#=> #<Hpricot:Elements[{p ...}, {p ...}]>
|
||||||
|
|
||||||
|
=== Finding Just One Element
|
||||||
|
|
||||||
|
If you're looking for a single element, the <tt>at</tt> method will return the
|
||||||
|
first element matched by the expression. In this case, you'll get back the
|
||||||
|
element itself rather than the <tt>Hpricot::Elements</tt> array.
|
||||||
|
|
||||||
|
doc.at("body")['onload']
|
||||||
|
|
||||||
|
The above code will find the body tag and give you back the <tt>onload</tt>
|
||||||
|
attribute. This is the most common reason to use the element directly: when
|
||||||
|
reading and writing HTML attributes.
|
||||||
|
|
||||||
|
=== Fetching the Contents of an Element
|
||||||
|
|
||||||
|
Just as with browser scripting, the <tt>inner_html</tt> property can be used to
|
||||||
|
get the inner contents of an element.
|
||||||
|
|
||||||
|
(doc/"#elementID").inner_html
|
||||||
|
#=> "..<b>contents</b>.."
|
||||||
|
|
||||||
|
If your expression matches more than one element, you'll get back the contents
|
||||||
|
of ''all the matched elements''. So you may want to use <tt>first</tt> to be
|
||||||
|
sure you get back only one.
|
||||||
|
|
||||||
|
(doc/"#elementID").first.inner_html
|
||||||
|
#=> "..<b>contents</b>.."
|
||||||
|
|
||||||
|
=== Fetching the HTML for an Element
|
||||||
|
|
||||||
|
If you want the HTML for the whole element (not just the contents), use
|
||||||
|
<tt>to_html</tt>:
|
||||||
|
|
||||||
|
(doc/"#elementID").to_html
|
||||||
|
#=> "<div id='elementID'>...</div>"
|
||||||
|
|
||||||
|
=== Looping
|
||||||
|
|
||||||
|
All searches return a set of <tt>Hpricot::Elements</tt>. Go ahead and loop
|
||||||
|
through them like you would an array.
|
||||||
|
|
||||||
|
(doc/"p/a/img").each do |img|
|
||||||
|
puts img.attributes['class']
|
||||||
|
end
|
||||||
|
|
||||||
|
=== Continuing Searches
|
||||||
|
|
||||||
|
Searches can be continued from a collection of elements, in order to search deeper.
|
||||||
|
|
||||||
|
# find all paragraphs.
|
||||||
|
elements = doc.search("/html/body//p")
|
||||||
|
# continue the search by finding any images within those paragraphs.
|
||||||
|
(elements/"img")
|
||||||
|
#=> #<Hpricot::Elements[{img ...}, {img ...}]>
|
||||||
|
|
||||||
|
Searches can also be continued by searching within container elements.
|
||||||
|
|
||||||
|
# find all images within paragraphs.
|
||||||
|
doc.search("/html/body//p").each do |para|
|
||||||
|
puts "== Found a paragraph =="
|
||||||
|
pp para
|
||||||
|
|
||||||
|
imgs = para.search("img")
|
||||||
|
if imgs.any?
|
||||||
|
puts "== Found #{imgs.length} images inside =="
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
Of course, the most succinct ways to do the above are using CSS or XPath.
|
||||||
|
|
||||||
|
# the xpath version
|
||||||
|
(doc/"/html/body//p//img")
|
||||||
|
# the css version
|
||||||
|
(doc/"html > body > p img")
|
||||||
|
# ..or symbols work, too!
|
||||||
|
(doc/:html/:body/:p/:img)
|
||||||
|
|
||||||
|
=== Looping Edits
|
||||||
|
|
||||||
|
You may certainly edit objects from within your search loops. Then, when you
|
||||||
|
spit out the HTML, the altered elements will show.
|
||||||
|
|
||||||
|
(doc/"span.entryPermalink").each do |span|
|
||||||
|
span.attributes['class'] = 'newLinks'
|
||||||
|
end
|
||||||
|
puts doc
|
||||||
|
|
||||||
|
This changes all <tt>span.entryPermalink</tt> elements to
|
||||||
|
<tt>span.newLinks</tt>. Keep in mind that there are often more convenient ways
|
||||||
|
of doing this. Such as the <tt>set</tt> method:
|
||||||
|
|
||||||
|
(doc/"span.entryPermalink").set(:class => 'newLinks')
|
||||||
|
|
||||||
|
=== Figuring Out Paths
|
||||||
|
|
||||||
|
Every element can tell you its unique path (either XPath or CSS) to get to the
|
||||||
|
element from the root tag.
|
||||||
|
|
||||||
|
The <tt>css_path</tt> method:
|
||||||
|
|
||||||
|
doc.at("div > div:nth(1)").css_path
|
||||||
|
#=> "div > div:nth(1)"
|
||||||
|
doc.at("#header").css_path
|
||||||
|
#=> "#header"
|
||||||
|
|
||||||
|
Or, the <tt>xpath</tt> method:
|
||||||
|
|
||||||
|
doc.at("div > div:nth(1)").xpath
|
||||||
|
#=> "/div/div:eq(1)"
|
||||||
|
doc.at("#header").xpath
|
||||||
|
#=> "//div[@id='header']"
|
||||||
|
|
||||||
|
== Hpricot Fixups
|
||||||
|
|
||||||
|
When loading HTML documents, you have a few settings that can make Hpricot more
|
||||||
|
or less intense about how it gets involved.
|
||||||
|
|
||||||
|
== :fixup_tags
|
||||||
|
|
||||||
|
Really, there are so many ways to clean up HTML and your intentions may be to
|
||||||
|
keep the HTML as-is. So Hpricot's default behavior is to keep things flexible.
|
||||||
|
Making sure to open and close all the tags, but ignore any validation problems.
|
||||||
|
|
||||||
|
As of Hpricot 0.4, there's a new <tt>:fixup_tags</tt> option which will attempt
|
||||||
|
to shift the document's tags to meet XHTML 1.0 Strict.
|
||||||
|
|
||||||
|
doc = open("index.html") { |f| Hpricot f, :fixup_tags => true }
|
||||||
|
|
||||||
|
This doesn't quite meet the XHTML 1.0 Strict standard, it just tries to follow
|
||||||
|
the rules a bit better. Like: say Hpricot finds a paragraph in a link, it's
|
||||||
|
going to move the paragraph below the link. Or up and out of other elements
|
||||||
|
where paragraphs don't belong.
|
||||||
|
|
||||||
|
If an unknown element is found, it is ignored. Again, <tt>:fixup_tags</tt>.
|
||||||
|
|
||||||
|
== :xhtml_strict
|
||||||
|
|
||||||
|
So, let's go beyond just trying to fix the hierarchy. The
|
||||||
|
<tt>:xhtml_strict</tt> option really tries to force the document to be an XHTML
|
||||||
|
1.0 Strict document. Even at the cost of removing elements that get in the way.
|
||||||
|
|
||||||
|
doc = open("index.html") { |f| Hpricot f, :xhtml_strict => true }
|
||||||
|
|
||||||
|
What measures does <tt>:xhtml_strict</tt> take?
|
||||||
|
|
||||||
|
1. Shift elements into their proper containers just like :fixup_tags.
|
||||||
|
2. Remove unknown elements.
|
||||||
|
3. Remove unknown attributes.
|
||||||
|
4. Remove illegal content.
|
||||||
|
5. Alter the doctype to XHTML 1.0 Strict.
|
||||||
|
|
||||||
|
== Hpricot.XML()
|
||||||
|
|
||||||
|
The last option is the <tt>:xml</tt> option, which makes some slight variations
|
||||||
|
on the standard mode. The main difference is that :xml mode won't try to output
|
||||||
|
tags which are friendlier for browsers. For example, if an opening and closing
|
||||||
|
<tt>br</tt> tag is found, XML mode won't try to turn that into an empty element.
|
||||||
|
|
||||||
|
XML mode also doesn't downcase the tags and attributes for you. So pay attention
|
||||||
|
to case, friends.
|
||||||
|
|
||||||
|
The primary way to use Hpricot's XML mode is to call the Hpricot.XML method:
|
||||||
|
|
||||||
|
doc = open("http://redhanded.hobix.com/index.xml") do |f|
|
||||||
|
Hpricot.XML(f)
|
||||||
|
end
|
||||||
|
|
||||||
|
*Also, :fixup_tags is canceled out by the :xml option.* This is because
|
||||||
|
:fixup_tags makes assumptions based how HTML is structured. Specifically, how
|
||||||
|
tags are defined in the XHTML 1.0 DTD.
|
@ -0,0 +1,211 @@
|
|||||||
|
require 'rake'
|
||||||
|
require 'rake/clean'
|
||||||
|
require 'rake/gempackagetask'
|
||||||
|
require 'rake/rdoctask'
|
||||||
|
require 'rake/testtask'
|
||||||
|
require 'fileutils'
|
||||||
|
include FileUtils
|
||||||
|
|
||||||
|
NAME = "hpricot"
|
||||||
|
REV = `svn info`[/Revision: (\d+)/, 1] rescue nil
|
||||||
|
VERS = ENV['VERSION'] || "0.6" + (REV ? ".#{REV}" : "")
|
||||||
|
PKG = "#{NAME}-#{VERS}"
|
||||||
|
BIN = "*.{bundle,jar,so,obj,pdb,lib,def,exp}"
|
||||||
|
ARCHLIB = "lib/#{::Config::CONFIG['arch']}"
|
||||||
|
CLEAN.include ["ext/hpricot_scan/#{BIN}", "lib/**/#{BIN}", 'ext/hpricot_scan/Makefile',
|
||||||
|
'**/.*.sw?', '*.gem', '.config']
|
||||||
|
RDOC_OPTS = ['--quiet', '--title', 'The Hpricot Reference', '--main', 'README', '--inline-source']
|
||||||
|
PKG_FILES = %w(CHANGELOG COPYING README Rakefile) +
|
||||||
|
Dir.glob("{bin,doc,test,lib,extras}/**/*") +
|
||||||
|
Dir.glob("ext/**/*.{h,java,c,rb,rl}") +
|
||||||
|
%w[ext/hpricot_scan/hpricot_scan.c] # needed because it's generated later
|
||||||
|
SPEC =
|
||||||
|
Gem::Specification.new do |s|
|
||||||
|
s.name = NAME
|
||||||
|
s.version = VERS
|
||||||
|
s.platform = Gem::Platform::RUBY
|
||||||
|
s.has_rdoc = true
|
||||||
|
s.rdoc_options += RDOC_OPTS
|
||||||
|
s.extra_rdoc_files = ["README", "CHANGELOG", "COPYING"]
|
||||||
|
s.summary = "a swift, liberal HTML parser with a fantastic library"
|
||||||
|
s.description = s.summary
|
||||||
|
s.author = "why the lucky stiff"
|
||||||
|
s.email = 'why@ruby-lang.org'
|
||||||
|
s.homepage = 'http://code.whytheluckystiff.net/hpricot/'
|
||||||
|
s.files = PKG_FILES
|
||||||
|
s.require_paths = [ARCHLIB, "lib"]
|
||||||
|
s.extensions = FileList["ext/**/extconf.rb"].to_a
|
||||||
|
s.bindir = "bin"
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Does a full compile, test run"
|
||||||
|
task :default => [:compile, :test]
|
||||||
|
|
||||||
|
desc "Packages up Hpricot."
|
||||||
|
task :package => [:clean, :ragel]
|
||||||
|
|
||||||
|
desc "Releases packages for all Hpricot packages and platforms."
|
||||||
|
task :release => [:package, :package_win32, :package_jruby]
|
||||||
|
|
||||||
|
desc "Run all the tests"
|
||||||
|
Rake::TestTask.new do |t|
|
||||||
|
t.libs << "test" << ARCHLIB
|
||||||
|
t.test_files = FileList['test/test_*.rb']
|
||||||
|
t.verbose = true
|
||||||
|
end
|
||||||
|
|
||||||
|
Rake::RDocTask.new do |rdoc|
|
||||||
|
rdoc.rdoc_dir = 'doc/rdoc'
|
||||||
|
rdoc.options += RDOC_OPTS
|
||||||
|
rdoc.main = "README"
|
||||||
|
rdoc.rdoc_files.add ['README', 'CHANGELOG', 'COPYING', 'lib/**/*.rb']
|
||||||
|
end
|
||||||
|
|
||||||
|
Rake::GemPackageTask.new(SPEC) do |p|
|
||||||
|
p.need_tar = true
|
||||||
|
p.gem_spec = SPEC
|
||||||
|
end
|
||||||
|
|
||||||
|
extension = "hpricot_scan"
|
||||||
|
ext = "ext/hpricot_scan"
|
||||||
|
ext_so = "#{ext}/#{extension}.#{Config::CONFIG['DLEXT']}"
|
||||||
|
ext_files = FileList[
|
||||||
|
"#{ext}/*.c",
|
||||||
|
"#{ext}/*.h",
|
||||||
|
"#{ext}/*.rl",
|
||||||
|
"#{ext}/extconf.rb",
|
||||||
|
"#{ext}/Makefile",
|
||||||
|
"lib"
|
||||||
|
]
|
||||||
|
|
||||||
|
task "lib" do
|
||||||
|
directory "lib"
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Compiles the Ruby extension"
|
||||||
|
task :compile => [:hpricot_scan] do
|
||||||
|
if Dir.glob(File.join(ARCHLIB,"hpricot_scan.*")).length == 0
|
||||||
|
STDERR.puts "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
|
||||||
|
STDERR.puts "Gem actually failed to build. Your system is"
|
||||||
|
STDERR.puts "NOT configured properly to build hpricot."
|
||||||
|
STDERR.puts "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
|
||||||
|
exit(1)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
task :hpricot_scan => [:ragel]
|
||||||
|
|
||||||
|
desc "Builds just the #{extension} extension"
|
||||||
|
task extension.to_sym => ["#{ext}/Makefile", ext_so ]
|
||||||
|
|
||||||
|
file "#{ext}/Makefile" => ["#{ext}/extconf.rb"] do
|
||||||
|
Dir.chdir(ext) do ruby "extconf.rb" end
|
||||||
|
end
|
||||||
|
|
||||||
|
file ext_so => ext_files do
|
||||||
|
Dir.chdir(ext) do
|
||||||
|
sh(PLATFORM =~ /win32/ ? 'nmake' : 'make')
|
||||||
|
end
|
||||||
|
mkdir_p ARCHLIB
|
||||||
|
cp ext_so, ARCHLIB
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "returns the ragel version"
|
||||||
|
task :ragel_version do
|
||||||
|
@ragel_v = `ragel -v`[/(version )(\S*)/,2].to_f
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Generates the C scanner code with Ragel."
|
||||||
|
task :ragel => [:ragel_version] do
|
||||||
|
sh %{ragel ext/hpricot_scan/hpricot_scan.rl | #{@ragel_v >= 5.18 ? 'rlgen-cd' : 'rlcodegen'} -G2 -o ext/hpricot_scan/hpricot_scan.c}
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Generates the Java scanner code with Ragel."
|
||||||
|
task :ragel_java => [:ragel_version] do
|
||||||
|
sh %{ragel -J ext/hpricot_scan/hpricot_scan.java.rl | #{@ragel_v >= 5.18 ? 'rlgen-java' : 'rlcodegen'} -o ext/hpricot_scan/HpricotScanService.java}
|
||||||
|
end
|
||||||
|
|
||||||
|
### Win32 Packages ###
|
||||||
|
|
||||||
|
Win32Spec = SPEC.dup
|
||||||
|
Win32Spec.platform = Gem::Platform::WIN32
|
||||||
|
Win32Spec.files = PKG_FILES + ["#{ARCHLIB}/hpricot_scan.so"]
|
||||||
|
Win32Spec.extensions = []
|
||||||
|
|
||||||
|
WIN32_PKG_DIR = "#{PKG}-mswin32"
|
||||||
|
|
||||||
|
desc "Package up the Win32 distribution."
|
||||||
|
file WIN32_PKG_DIR => [:package] do
|
||||||
|
sh "tar zxf pkg/#{PKG}.tgz"
|
||||||
|
mv PKG, WIN32_PKG_DIR
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Cross-compile the hpricot_scan extension for win32"
|
||||||
|
file "hpricot_scan_win32" => [WIN32_PKG_DIR] do
|
||||||
|
cp "extras/mingw-rbconfig.rb", "#{WIN32_PKG_DIR}/ext/hpricot_scan/rbconfig.rb"
|
||||||
|
sh "cd #{WIN32_PKG_DIR}/ext/hpricot_scan/ && ruby -I. extconf.rb && make"
|
||||||
|
mv "#{WIN32_PKG_DIR}/ext/hpricot_scan/hpricot_scan.so", "#{WIN32_PKG_DIR}/#{ARCHLIB}"
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Build the binary RubyGems package for win32"
|
||||||
|
task :package_win32 => ["hpricot_scan_win32"] do
|
||||||
|
Dir.chdir("#{WIN32_PKG_DIR}") do
|
||||||
|
Gem::Builder.new(Win32Spec).build
|
||||||
|
verbose(true) {
|
||||||
|
mv Dir["*.gem"].first, "../pkg/#{WIN32_PKG_DIR}.gem"
|
||||||
|
}
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
CLEAN.include WIN32_PKG_DIR
|
||||||
|
|
||||||
|
### JRuby Packages ###
|
||||||
|
|
||||||
|
compile_java = proc do
|
||||||
|
sh %{javac -source 1.4 -target 1.4 -classpath $JRUBY_HOME/lib/jruby.jar HpricotScanService.java}
|
||||||
|
sh %{jar cf hpricot_scan.jar HpricotScanService.class}
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Compiles the JRuby extension"
|
||||||
|
task :hpricot_scan_java => [:ragel_java] do
|
||||||
|
Dir.chdir("ext/hpricot_scan", &compile_java)
|
||||||
|
end
|
||||||
|
|
||||||
|
JRubySpec = SPEC.dup
|
||||||
|
JRubySpec.platform = 'jruby'
|
||||||
|
JRubySpec.files = PKG_FILES + ["#{ARCHLIB}/hpricot_scan.jar"]
|
||||||
|
JRubySpec.extensions = []
|
||||||
|
|
||||||
|
JRUBY_PKG_DIR = "#{PKG}-jruby"
|
||||||
|
|
||||||
|
desc "Package up the JRuby distribution."
|
||||||
|
file JRUBY_PKG_DIR => [:ragel_java, :package] do
|
||||||
|
sh "tar zxf pkg/#{PKG}.tgz"
|
||||||
|
mv PKG, JRUBY_PKG_DIR
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Cross-compile the hpricot_scan extension for JRuby"
|
||||||
|
file "hpricot_scan_jruby" => [JRUBY_PKG_DIR] do
|
||||||
|
Dir.chdir("#{JRUBY_PKG_DIR}/ext/hpricot_scan", &compile_java)
|
||||||
|
mv "#{JRUBY_PKG_DIR}/ext/hpricot_scan/hpricot_scan.jar", "#{JRUBY_PKG_DIR}/#{ARCHLIB}"
|
||||||
|
end
|
||||||
|
|
||||||
|
desc "Build the RubyGems package for JRuby"
|
||||||
|
task :package_jruby => ["hpricot_scan_jruby"] do
|
||||||
|
Dir.chdir("#{JRUBY_PKG_DIR}") do
|
||||||
|
Gem::Builder.new(JRubySpec).build
|
||||||
|
verbose(true) {
|
||||||
|
mv Dir["*.gem"].first, "../pkg/#{JRUBY_PKG_DIR}.gem"
|
||||||
|
}
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
CLEAN.include JRUBY_PKG_DIR
|
||||||
|
|
||||||
|
task :install do
|
||||||
|
sh %{rake package}
|
||||||
|
sh %{sudo gem install pkg/#{NAME}-#{VERS}}
|
||||||
|
end
|
||||||
|
|
||||||
|
task :uninstall => [:clean] do
|
||||||
|
sh %{sudo gem uninstall #{NAME}}
|
||||||
|
end
|
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,6 @@
|
|||||||
|
require 'mkmf'
|
||||||
|
|
||||||
|
dir_config("hpricot_scan")
|
||||||
|
have_library("c", "main")
|
||||||
|
|
||||||
|
create_makefile("hpricot_scan")
|
@ -0,0 +1,76 @@
|
|||||||
|
%%{
|
||||||
|
|
||||||
|
machine hpricot_common;
|
||||||
|
|
||||||
|
#
|
||||||
|
# HTML tokens
|
||||||
|
# (a blatant rip from HTree)
|
||||||
|
#
|
||||||
|
newline = '\n' @{curline += 1;} ;
|
||||||
|
NameChar = [\-A-Za-z0-9._:?] ;
|
||||||
|
Name = [A-Za-z_:] NameChar* ;
|
||||||
|
StartComment = "<!--" ;
|
||||||
|
EndComment = "-->" ;
|
||||||
|
StartCdata = "<![CDATA[" ;
|
||||||
|
EndCdata = "]]>" ;
|
||||||
|
|
||||||
|
NameCap = Name >_tag %tag;
|
||||||
|
NameAttr = NameChar+ >_akey %akey ;
|
||||||
|
Q1Char = ( "\\\'" | [^'] ) ;
|
||||||
|
Q1Attr = Q1Char* >_aval %aval ;
|
||||||
|
Q2Char = ( "\\\"" | [^"] ) ;
|
||||||
|
Q2Attr = Q2Char* >_aval %aval ;
|
||||||
|
UnqAttr = ( space >_aval | [^ \t\r\n<>"'] >_aval [^ \t\r\n<>]* %aunq ) ;
|
||||||
|
Nmtoken = NameChar+ >_akey %akey ;
|
||||||
|
|
||||||
|
Attr = NameAttr space* "=" space* ('"' Q2Attr '"' | "'" Q1Attr "'" | UnqAttr space+ ) space* ;
|
||||||
|
AttrEnd = ( NameAttr space* "=" space* UnqAttr? | Nmtoken >new_attr %save_attr ) ;
|
||||||
|
AttrSet = ( Attr >new_attr %save_attr | Nmtoken >new_attr space+ %save_attr ) ;
|
||||||
|
StartTag = "<" NameCap space+ AttrSet* (AttrEnd >new_attr %save_attr)? ">" | "<" NameCap ">";
|
||||||
|
EmptyTag = "<" NameCap space+ AttrSet* (AttrEnd >new_attr %save_attr)? "/>" | "<" NameCap "/>" ;
|
||||||
|
|
||||||
|
EndTag = "</" NameCap space* ">" ;
|
||||||
|
XmlVersionNum = [a-zA-Z0-9_.:\-]+ >_aval %xmlver ;
|
||||||
|
XmlVersionInfo = space+ "version" space* "=" space* ("'" XmlVersionNum "'" | '"' XmlVersionNum '"' ) ;
|
||||||
|
XmlEncName = [A-Za-z] >_aval [A-Za-z0-9._\-]* %xmlenc ;
|
||||||
|
XmlEncodingDecl = space+ "encoding" space* "=" space* ("'" XmlEncName "'" | '"' XmlEncName '"' ) ;
|
||||||
|
XmlYesNo = ("yes" | "no") >_aval %xmlsd ;
|
||||||
|
XmlSDDecl = space+ "standalone" space* "=" space* ("'" XmlYesNo "'" | '"' XmlYesNo '"') ;
|
||||||
|
XmlDecl = "<?xml" XmlVersionInfo XmlEncodingDecl? XmlSDDecl? space* "?"? ">" ;
|
||||||
|
|
||||||
|
SystemLiteral = '"' [^"]* >_aval %sysid '"' | "'" [^']* >_aval %sysid "'" ;
|
||||||
|
PubidLiteral = '"' [\t a-zA-Z0-9\-'()+,./:=?;!*\#@$_%]* >_aval %pubid '"' |
|
||||||
|
"'" [\t a-zA-Z0-9\-'()+,./:=?;!*\#@$_%]* >_aval %pubid "'" ;
|
||||||
|
ExternalID = ( "SYSTEM" | "PUBLIC" space+ PubidLiteral ) (space+ SystemLiteral)? ;
|
||||||
|
DocType = "<!DOCTYPE" space+ NameCap (space+ ExternalID)? space* ("[" [^\]]* "]" space*)? ">" ;
|
||||||
|
StartXmlProcIns = "<?" Name >{ TEXT_PASS(); } space+ ;
|
||||||
|
EndXmlProcIns = "?"? ">" ;
|
||||||
|
|
||||||
|
html_comment := |*
|
||||||
|
EndComment @{ EBLK(comment, 3); fgoto main; };
|
||||||
|
any | newline { TEXT_PASS(); };
|
||||||
|
*|;
|
||||||
|
|
||||||
|
html_cdata := |*
|
||||||
|
EndCdata @{ EBLK(cdata, 3); fgoto main; };
|
||||||
|
any | newline { TEXT_PASS(); };
|
||||||
|
*|;
|
||||||
|
|
||||||
|
html_procins := |*
|
||||||
|
EndXmlProcIns @{ EBLK(procins, 2); fgoto main; };
|
||||||
|
any | newline { TEXT_PASS(); };
|
||||||
|
*|;
|
||||||
|
|
||||||
|
main := |*
|
||||||
|
XmlDecl >newEle { ELE(xmldecl); };
|
||||||
|
DocType >newEle { ELE(doctype); };
|
||||||
|
StartXmlProcIns >newEle { fgoto html_procins; };
|
||||||
|
StartTag >newEle { ELE(stag); };
|
||||||
|
EndTag >newEle { ELE(etag); };
|
||||||
|
EmptyTag >newEle { ELE(emptytag); };
|
||||||
|
StartComment >newEle { fgoto html_comment; };
|
||||||
|
StartCdata >newEle { fgoto html_cdata; };
|
||||||
|
any | newline { TEXT_PASS(); };
|
||||||
|
*|;
|
||||||
|
|
||||||
|
}%%;
|
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,79 @@
|
|||||||
|
/*
|
||||||
|
* hpricot_scan.h
|
||||||
|
*
|
||||||
|
* $Author: why $
|
||||||
|
* $Date: 2006-05-08 22:03:50 -0600 (Mon, 08 May 2006) $
|
||||||
|
*
|
||||||
|
* Copyright (C) 2006 why the lucky stiff
|
||||||
|
* You can redistribute it and/or modify it under the same terms as Ruby.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef hpricot_scan_h
|
||||||
|
#define hpricot_scan_h
|
||||||
|
|
||||||
|
#include <sys/types.h>
|
||||||
|
|
||||||
|
#if defined(_WIN32)
|
||||||
|
#include <stddef.h>
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Memory Allocation
|
||||||
|
*/
|
||||||
|
#if defined(HAVE_ALLOCA_H) && !defined(__GNUC__)
|
||||||
|
#include <alloca.h>
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef NULL
|
||||||
|
# define NULL (void *)0
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define BUFSIZE 16384
|
||||||
|
|
||||||
|
#define S_ALLOC_N(type,n) (type*)malloc(sizeof(type)*(n))
|
||||||
|
#define S_ALLOC(type) (type*)malloc(sizeof(type))
|
||||||
|
#define S_REALLOC_N(var,type,n) (var)=(type*)realloc((char*)(var),sizeof(type)*(n))
|
||||||
|
#define S_FREE(n) free(n); n = NULL;
|
||||||
|
|
||||||
|
#define S_ALLOCA_N(type,n) (type*)alloca(sizeof(type)*(n))
|
||||||
|
|
||||||
|
#define S_MEMZERO(p,type,n) memset((p), 0, sizeof(type)*(n))
|
||||||
|
#define S_MEMCPY(p1,p2,type,n) memcpy((p1), (p2), sizeof(type)*(n))
|
||||||
|
#define S_MEMMOVE(p1,p2,type,n) memmove((p1), (p2), sizeof(type)*(n))
|
||||||
|
#define S_MEMCMP(p1,p2,type,n) memcmp((p1), (p2), sizeof(type)*(n))
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
void *name;
|
||||||
|
void *attributes;
|
||||||
|
} hpricot_element;
|
||||||
|
|
||||||
|
typedef void (*hpricot_element_cb)(void *data, hpricot_element *token);
|
||||||
|
|
||||||
|
typedef struct hpricot_scan {
|
||||||
|
int lineno;
|
||||||
|
int cs;
|
||||||
|
size_t nread;
|
||||||
|
size_t mark;
|
||||||
|
|
||||||
|
void *data;
|
||||||
|
|
||||||
|
hpricot_element_cb xmldecl;
|
||||||
|
hpricot_element_cb doctype;
|
||||||
|
hpricot_element_cb xmlprocins;
|
||||||
|
hpricot_element_cb starttag;
|
||||||
|
hpricot_element_cb endtag;
|
||||||
|
hpricot_element_cb emptytag;
|
||||||
|
hpricot_element_cb comment;
|
||||||
|
hpricot_element_cb cdata;
|
||||||
|
|
||||||
|
} http_scan;
|
||||||
|
|
||||||
|
// int hpricot_scan_init(hpricot_scan *scan);
|
||||||
|
// int hpricot_scan_finish(hpricot_scan *scan);
|
||||||
|
// size_t hpricot_scan_execute(hpricot_scan *scan, const char *data, size_t len, size_t off);
|
||||||
|
// int hpricot_scan_has_error(hpricot_scan *scan);
|
||||||
|
// int hpricot_scan_is_finished(hpricot_scan *scan);
|
||||||
|
//
|
||||||
|
// #define hpricot_scan_nread(scan) (scan)->nread
|
||||||
|
|
||||||
|
#endif
|
@ -0,0 +1,363 @@
|
|||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
|
||||||
|
import org.jruby.Ruby;
|
||||||
|
import org.jruby.RubyClass;
|
||||||
|
import org.jruby.RubyHash;
|
||||||
|
import org.jruby.RubyModule;
|
||||||
|
import org.jruby.RubyNumeric;
|
||||||
|
import org.jruby.RubyString;
|
||||||
|
import org.jruby.runtime.Block;
|
||||||
|
import org.jruby.runtime.CallbackFactory;
|
||||||
|
import org.jruby.runtime.builtin.IRubyObject;
|
||||||
|
import org.jruby.exceptions.RaiseException;
|
||||||
|
import org.jruby.runtime.load.BasicLibraryService;
|
||||||
|
|
||||||
|
public class HpricotScanService implements BasicLibraryService {
|
||||||
|
public static String NO_WAY_SERIOUSLY="*** This should not happen, please send a bug report with the HTML you're parsing to why@whytheluckystiff.net. So sorry!";
|
||||||
|
|
||||||
|
public void ELE(IRubyObject N) {
|
||||||
|
if (tokend > tokstart || text) {
|
||||||
|
IRubyObject raw_string = runtime.getNil();
|
||||||
|
ele_open = false; text = false;
|
||||||
|
if (tokstart != -1 && N != cdata && N != sym_text && N != procins && N != comment) {
|
||||||
|
raw_string = runtime.newString(new String(buf,tokstart,tokend-tokstart));
|
||||||
|
}
|
||||||
|
rb_yield_tokens(N, tag[0], attr, raw_string, taint);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void SET(IRubyObject[] N, int E) {
|
||||||
|
int mark = 0;
|
||||||
|
if(N == tag) {
|
||||||
|
if(mark_tag == -1 || E == mark_tag) {
|
||||||
|
tag[0] = runtime.newString("");
|
||||||
|
} else if(E > mark_tag) {
|
||||||
|
tag[0] = runtime.newString(new String(buf,mark_tag, E-mark_tag));
|
||||||
|
}
|
||||||
|
} else if(N == akey) {
|
||||||
|
if(mark_akey == -1 || E == mark_akey) {
|
||||||
|
akey[0] = runtime.newString("");
|
||||||
|
} else if(E > mark_akey) {
|
||||||
|
akey[0] = runtime.newString(new String(buf,mark_akey, E-mark_akey));
|
||||||
|
}
|
||||||
|
} else if(N == aval) {
|
||||||
|
if(mark_aval == -1 || E == mark_aval) {
|
||||||
|
aval[0] = runtime.newString("");
|
||||||
|
} else if(E > mark_aval) {
|
||||||
|
aval[0] = runtime.newString(new String(buf,mark_aval, E-mark_aval));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void CAT(IRubyObject[] N, int E) {
|
||||||
|
if(N[0].isNil()) {
|
||||||
|
SET(N,E);
|
||||||
|
} else {
|
||||||
|
int mark = 0;
|
||||||
|
if(N == tag) {
|
||||||
|
mark = mark_tag;
|
||||||
|
} else if(N == akey) {
|
||||||
|
mark = mark_akey;
|
||||||
|
} else if(N == aval) {
|
||||||
|
mark = mark_aval;
|
||||||
|
}
|
||||||
|
((RubyString)(N[0])).append(runtime.newString(new String(buf, mark, E-mark)));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void SLIDE(Object N) {
|
||||||
|
int mark = 0;
|
||||||
|
if(N == tag) {
|
||||||
|
mark = mark_tag;
|
||||||
|
} else if(N == akey) {
|
||||||
|
mark = mark_akey;
|
||||||
|
} else if(N == aval) {
|
||||||
|
mark = mark_aval;
|
||||||
|
}
|
||||||
|
if(mark > tokstart) {
|
||||||
|
if(N == tag) {
|
||||||
|
mark_tag -= tokstart;
|
||||||
|
} else if(N == akey) {
|
||||||
|
mark_akey -= tokstart;
|
||||||
|
} else if(N == aval) {
|
||||||
|
mark_aval -= tokstart;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ATTR(IRubyObject K, IRubyObject V) {
|
||||||
|
if(!K.isNil()) {
|
||||||
|
if(attr.isNil()) {
|
||||||
|
attr = RubyHash.newHash(runtime);
|
||||||
|
}
|
||||||
|
((RubyHash)attr).aset(K,V);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ATTR(IRubyObject[] K, IRubyObject V) {
|
||||||
|
ATTR(K[0],V);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ATTR(IRubyObject K, IRubyObject[] V) {
|
||||||
|
ATTR(K,V[0]);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ATTR(IRubyObject[] K, IRubyObject[] V) {
|
||||||
|
ATTR(K[0],V[0]);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void TEXT_PASS() {
|
||||||
|
if(!text) {
|
||||||
|
if(ele_open) {
|
||||||
|
ele_open = false;
|
||||||
|
if(tokstart > -1) {
|
||||||
|
mark_tag = tokstart;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
mark_tag = p;
|
||||||
|
}
|
||||||
|
attr = runtime.getNil();
|
||||||
|
tag[0] = runtime.getNil();
|
||||||
|
text = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void EBLK(IRubyObject N, int T) {
|
||||||
|
CAT(tag, p - T + 1);
|
||||||
|
ELE(N);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
public void rb_raise(RubyClass error, String message) {
|
||||||
|
throw new RaiseException(runtime, error, message, true);
|
||||||
|
}
|
||||||
|
|
||||||
|
public IRubyObject rb_str_new2(String s) {
|
||||||
|
return runtime.newString(s);
|
||||||
|
}
|
||||||
|
|
||||||
|
%%{
|
||||||
|
machine hpricot_scan;
|
||||||
|
|
||||||
|
action newEle {
|
||||||
|
if (text) {
|
||||||
|
CAT(tag, p);
|
||||||
|
ELE(sym_text);
|
||||||
|
text = false;
|
||||||
|
}
|
||||||
|
attr = runtime.getNil();
|
||||||
|
tag[0] = runtime.getNil();
|
||||||
|
mark_tag = -1;
|
||||||
|
ele_open = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
action _tag { mark_tag = p; }
|
||||||
|
action _aval { mark_aval = p; }
|
||||||
|
action _akey { mark_akey = p; }
|
||||||
|
action tag { SET(tag, p); }
|
||||||
|
action tagc { SET(tag, p-1); }
|
||||||
|
action aval { SET(aval, p); }
|
||||||
|
action aunq {
|
||||||
|
if (buf[p-1] == '"' || buf[p-1] == '\'') { SET(aval, p-1); }
|
||||||
|
else { SET(aval, p); }
|
||||||
|
}
|
||||||
|
action akey { SET(akey, p); }
|
||||||
|
action xmlver { SET(aval, p); ATTR(rb_str_new2("version"), aval); }
|
||||||
|
action xmlenc { SET(aval, p); ATTR(rb_str_new2("encoding"), aval); }
|
||||||
|
action xmlsd { SET(aval, p); ATTR(rb_str_new2("standalone"), aval); }
|
||||||
|
action pubid { SET(aval, p); ATTR(rb_str_new2("public_id"), aval); }
|
||||||
|
action sysid { SET(aval, p); ATTR(rb_str_new2("system_id"), aval); }
|
||||||
|
|
||||||
|
action new_attr {
|
||||||
|
akey[0] = runtime.getNil();
|
||||||
|
aval[0] = runtime.getNil();
|
||||||
|
mark_akey = -1;
|
||||||
|
mark_aval = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
action save_attr {
|
||||||
|
ATTR(akey, aval);
|
||||||
|
}
|
||||||
|
|
||||||
|
include hpricot_common "ext/hpricot_scan/hpricot_common.rl";
|
||||||
|
|
||||||
|
}%%
|
||||||
|
|
||||||
|
%% write data nofinal;
|
||||||
|
|
||||||
|
public final static int BUFSIZE=16384;
|
||||||
|
|
||||||
|
private void rb_yield_tokens(IRubyObject sym, IRubyObject tag, IRubyObject attr, IRubyObject raw, boolean taint) {
|
||||||
|
IRubyObject ary;
|
||||||
|
if (sym == runtime.newSymbol("text")) {
|
||||||
|
raw = tag;
|
||||||
|
}
|
||||||
|
ary = runtime.newArray(new IRubyObject[]{sym, tag, attr, raw});
|
||||||
|
if (taint) {
|
||||||
|
ary.setTaint(true);
|
||||||
|
tag.setTaint(true);
|
||||||
|
attr.setTaint(true);
|
||||||
|
raw.setTaint(true);
|
||||||
|
}
|
||||||
|
block.yield(runtime.getCurrentContext(), ary, null, null, false);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int cs, act, have = 0, nread = 0, curline = 1, p=-1;
|
||||||
|
boolean text = false;
|
||||||
|
int tokstart=-1, tokend;
|
||||||
|
char[] buf;
|
||||||
|
Ruby runtime;
|
||||||
|
IRubyObject attr, bufsize;
|
||||||
|
IRubyObject[] tag, akey, aval;
|
||||||
|
int mark_tag, mark_akey, mark_aval;
|
||||||
|
boolean done = false, ele_open = false;
|
||||||
|
int buffer_size = 0;
|
||||||
|
boolean taint = false;
|
||||||
|
Block block = null;
|
||||||
|
|
||||||
|
|
||||||
|
IRubyObject xmldecl, doctype, procins, stag, etag, emptytag, comment,
|
||||||
|
cdata, sym_text;
|
||||||
|
|
||||||
|
IRubyObject hpricot_scan(IRubyObject recv, IRubyObject port) {
|
||||||
|
attr = bufsize = runtime.getNil();
|
||||||
|
tag = new IRubyObject[]{runtime.getNil()};
|
||||||
|
akey = new IRubyObject[]{runtime.getNil()};
|
||||||
|
aval = new IRubyObject[]{runtime.getNil()};
|
||||||
|
|
||||||
|
RubyClass rb_eHpricotParseError = runtime.getModule("Hpricot").getClass("ParseError");
|
||||||
|
|
||||||
|
taint = port.isTaint();
|
||||||
|
if ( !port.respondsTo("read")) {
|
||||||
|
if ( port.respondsTo("to_str")) {
|
||||||
|
port = port.callMethod(runtime.getCurrentContext(),"to_str");
|
||||||
|
} else {
|
||||||
|
throw runtime.newArgumentError("bad Hpricot argument, String or IO only please.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
buffer_size = BUFSIZE;
|
||||||
|
if (recv.getInstanceVariable("@buffer_size") != null) {
|
||||||
|
bufsize = recv.getInstanceVariable("@buffer_size");
|
||||||
|
if (!bufsize.isNil()) {
|
||||||
|
buffer_size = RubyNumeric.fix2int(bufsize);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
buf = new char[buffer_size];
|
||||||
|
|
||||||
|
%% write init;
|
||||||
|
|
||||||
|
while( !done ) {
|
||||||
|
IRubyObject str;
|
||||||
|
p = have;
|
||||||
|
int pe;
|
||||||
|
int len, space = buffer_size - have;
|
||||||
|
|
||||||
|
if ( space == 0 ) {
|
||||||
|
/* We've used up the entire buffer storing an already-parsed token
|
||||||
|
* prefix that must be preserved. Likely caused by super-long attributes.
|
||||||
|
* See ticket #13. */
|
||||||
|
rb_raise(rb_eHpricotParseError, "ran out of buffer space on element <" + tag.toString() + ">, starting on line "+curline+".");
|
||||||
|
}
|
||||||
|
|
||||||
|
if (port.respondsTo("read")) {
|
||||||
|
str = port.callMethod(runtime.getCurrentContext(),"read",runtime.newFixnum(space));
|
||||||
|
} else {
|
||||||
|
str = ((RubyString)port).substr(nread,space);
|
||||||
|
}
|
||||||
|
|
||||||
|
str = str.convertToString();
|
||||||
|
String sss = str.toString();
|
||||||
|
char[] chars = sss.toCharArray();
|
||||||
|
System.arraycopy(chars,0,buf,p,chars.length);
|
||||||
|
|
||||||
|
len = sss.length();
|
||||||
|
nread += len;
|
||||||
|
|
||||||
|
if ( len < space ) {
|
||||||
|
len++;
|
||||||
|
done = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
pe = p + len;
|
||||||
|
char[] data = buf;
|
||||||
|
|
||||||
|
%% write exec;
|
||||||
|
|
||||||
|
if ( cs == hpricot_scan_error ) {
|
||||||
|
if(!tag[0].isNil()) {
|
||||||
|
rb_raise(rb_eHpricotParseError, "parse error on element <"+tag.toString()+">, starting on line "+curline+".\n" + NO_WAY_SERIOUSLY);
|
||||||
|
} else {
|
||||||
|
rb_raise(rb_eHpricotParseError, "parse error on line "+curline+".\n" + NO_WAY_SERIOUSLY);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ( done && ele_open ) {
|
||||||
|
ele_open = false;
|
||||||
|
if(tokstart > -1) {
|
||||||
|
mark_tag = tokstart;
|
||||||
|
tokstart = -1;
|
||||||
|
text = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if(tokstart == -1) {
|
||||||
|
have = 0;
|
||||||
|
/* text nodes have no tokstart because each byte is parsed alone */
|
||||||
|
if(mark_tag != -1 && text) {
|
||||||
|
if (done) {
|
||||||
|
if(mark_tag < p-1) {
|
||||||
|
CAT(tag, p-1);
|
||||||
|
ELE(sym_text);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
CAT(tag, p);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
mark_tag = 0;
|
||||||
|
} else {
|
||||||
|
have = pe - tokstart;
|
||||||
|
System.arraycopy(buf,tokstart,buf,0,have);
|
||||||
|
SLIDE(tag);
|
||||||
|
SLIDE(akey);
|
||||||
|
SLIDE(aval);
|
||||||
|
tokend = (tokend - tokstart);
|
||||||
|
tokstart = 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return runtime.getNil();
|
||||||
|
}
|
||||||
|
|
||||||
|
public static IRubyObject __hpricot_scan(IRubyObject recv, IRubyObject port, Block block) {
|
||||||
|
Ruby runtime = recv.getRuntime();
|
||||||
|
HpricotScanService service = new HpricotScanService();
|
||||||
|
service.runtime = runtime;
|
||||||
|
service.xmldecl = runtime.newSymbol("xmldecl");
|
||||||
|
service.doctype = runtime.newSymbol("doctype");
|
||||||
|
service.procins = runtime.newSymbol("procins");
|
||||||
|
service.stag = runtime.newSymbol("stag");
|
||||||
|
service.etag = runtime.newSymbol("etag");
|
||||||
|
service.emptytag = runtime.newSymbol("emptytag");
|
||||||
|
service.comment = runtime.newSymbol("comment");
|
||||||
|
service.cdata = runtime.newSymbol("cdata");
|
||||||
|
service.sym_text = runtime.newSymbol("text");
|
||||||
|
service.block = block;
|
||||||
|
return service.hpricot_scan(recv, port);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
public boolean basicLoad(final Ruby runtime) throws IOException {
|
||||||
|
Init_hpricot_scan(runtime);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static void Init_hpricot_scan(Ruby runtime) {
|
||||||
|
RubyModule mHpricot = runtime.defineModule("Hpricot");
|
||||||
|
mHpricot.getMetaClass().attr_accessor(new IRubyObject[]{runtime.newSymbol("buffer_size")});
|
||||||
|
CallbackFactory fact = runtime.callbackFactory(HpricotScanService.class);
|
||||||
|
mHpricot.getMetaClass().defineMethod("scan",fact.getSingletonMethod("__hpricot_scan",IRubyObject.class));
|
||||||
|
mHpricot.defineClassUnder("ParseError",runtime.getClass("Exception"),runtime.getClass("Exception").getAllocator());
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,273 @@
|
|||||||
|
/*
|
||||||
|
* hpricot_scan.rl
|
||||||
|
*
|
||||||
|
* $Author: why $
|
||||||
|
* $Date: 2006-05-08 22:03:50 -0600 (Mon, 08 May 2006) $
|
||||||
|
*
|
||||||
|
* Copyright (C) 2006 why the lucky stiff
|
||||||
|
*/
|
||||||
|
#include <ruby.h>
|
||||||
|
|
||||||
|
#define NO_WAY_SERIOUSLY "*** This should not happen, please send a bug report with the HTML you're parsing to why@whytheluckystiff.net. So sorry!"
|
||||||
|
|
||||||
|
static VALUE sym_xmldecl, sym_doctype, sym_procins, sym_stag, sym_etag, sym_emptytag, sym_comment,
|
||||||
|
sym_cdata, sym_text;
|
||||||
|
static VALUE rb_eHpricotParseError;
|
||||||
|
static ID s_read, s_to_str;
|
||||||
|
|
||||||
|
#define ELE(N) \
|
||||||
|
if (tokend > tokstart || text == 1) { \
|
||||||
|
VALUE raw_string = Qnil; \
|
||||||
|
ele_open = 0; text = 0; \
|
||||||
|
if (tokstart != 0 && sym_##N != sym_cdata && sym_##N != sym_text && sym_##N != sym_procins && sym_##N != sym_comment) { \
|
||||||
|
raw_string = rb_str_new(tokstart, tokend-tokstart); \
|
||||||
|
} \
|
||||||
|
rb_yield_tokens(sym_##N, tag, attr, raw_string, taint); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define SET(N, E) \
|
||||||
|
if (mark_##N == NULL || E == mark_##N) \
|
||||||
|
N = rb_str_new2(""); \
|
||||||
|
else if (E > mark_##N) \
|
||||||
|
N = rb_str_new(mark_##N, E - mark_##N);
|
||||||
|
|
||||||
|
#define CAT(N, E) if (NIL_P(N)) { SET(N, E); } else { rb_str_cat(N, mark_##N, E - mark_##N); }
|
||||||
|
|
||||||
|
#define SLIDE(N) if ( mark_##N > tokstart ) mark_##N = buf + (mark_##N - tokstart);
|
||||||
|
|
||||||
|
#define ATTR(K, V) \
|
||||||
|
if (!NIL_P(K)) { \
|
||||||
|
if (NIL_P(attr)) attr = rb_hash_new(); \
|
||||||
|
rb_hash_aset(attr, K, V); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define TEXT_PASS() \
|
||||||
|
if (text == 0) \
|
||||||
|
{ \
|
||||||
|
if (ele_open == 1) { \
|
||||||
|
ele_open = 0; \
|
||||||
|
if (tokstart > 0) { \
|
||||||
|
mark_tag = tokstart; \
|
||||||
|
} \
|
||||||
|
} else { \
|
||||||
|
mark_tag = p; \
|
||||||
|
} \
|
||||||
|
attr = Qnil; \
|
||||||
|
tag = Qnil; \
|
||||||
|
text = 1; \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define EBLK(N, T) CAT(tag, p - T + 1); ELE(N);
|
||||||
|
|
||||||
|
%%{
|
||||||
|
machine hpricot_scan;
|
||||||
|
|
||||||
|
action newEle {
|
||||||
|
if (text == 1) {
|
||||||
|
CAT(tag, p);
|
||||||
|
ELE(text);
|
||||||
|
text = 0;
|
||||||
|
}
|
||||||
|
attr = Qnil;
|
||||||
|
tag = Qnil;
|
||||||
|
mark_tag = NULL;
|
||||||
|
ele_open = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
action _tag { mark_tag = p; }
|
||||||
|
action _aval { mark_aval = p; }
|
||||||
|
action _akey { mark_akey = p; }
|
||||||
|
action tag { SET(tag, p); }
|
||||||
|
action tagc { SET(tag, p-1); }
|
||||||
|
action aval { SET(aval, p); }
|
||||||
|
action aunq {
|
||||||
|
if (*(p-1) == '"' || *(p-1) == '\'') { SET(aval, p-1); }
|
||||||
|
else { SET(aval, p); }
|
||||||
|
}
|
||||||
|
action akey { SET(akey, p); }
|
||||||
|
action xmlver { SET(aval, p); ATTR(rb_str_new2("version"), aval); }
|
||||||
|
action xmlenc { SET(aval, p); ATTR(rb_str_new2("encoding"), aval); }
|
||||||
|
action xmlsd { SET(aval, p); ATTR(rb_str_new2("standalone"), aval); }
|
||||||
|
action pubid { SET(aval, p); ATTR(rb_str_new2("public_id"), aval); }
|
||||||
|
action sysid { SET(aval, p); ATTR(rb_str_new2("system_id"), aval); }
|
||||||
|
|
||||||
|
action new_attr {
|
||||||
|
akey = Qnil;
|
||||||
|
aval = Qnil;
|
||||||
|
mark_akey = NULL;
|
||||||
|
mark_aval = NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
action save_attr {
|
||||||
|
ATTR(akey, aval);
|
||||||
|
}
|
||||||
|
|
||||||
|
include hpricot_common "ext/hpricot_scan/hpricot_common.rl";
|
||||||
|
|
||||||
|
}%%
|
||||||
|
|
||||||
|
%% write data nofinal;
|
||||||
|
|
||||||
|
#define BUFSIZE 16384
|
||||||
|
|
||||||
|
void rb_yield_tokens(VALUE sym, VALUE tag, VALUE attr, VALUE raw, int taint)
|
||||||
|
{
|
||||||
|
VALUE ary;
|
||||||
|
if (sym == sym_text) {
|
||||||
|
raw = tag;
|
||||||
|
}
|
||||||
|
ary = rb_ary_new3(4, sym, tag, attr, raw);
|
||||||
|
if (taint) {
|
||||||
|
OBJ_TAINT(ary);
|
||||||
|
OBJ_TAINT(tag);
|
||||||
|
OBJ_TAINT(attr);
|
||||||
|
OBJ_TAINT(raw);
|
||||||
|
}
|
||||||
|
rb_yield(ary);
|
||||||
|
}
|
||||||
|
|
||||||
|
VALUE hpricot_scan(VALUE self, VALUE port)
|
||||||
|
{
|
||||||
|
int cs, act, have = 0, nread = 0, curline = 1, text = 0;
|
||||||
|
char *tokstart = 0, *tokend = 0, *buf = NULL;
|
||||||
|
|
||||||
|
VALUE attr = Qnil, tag = Qnil, akey = Qnil, aval = Qnil, bufsize = Qnil;
|
||||||
|
char *mark_tag = 0, *mark_akey = 0, *mark_aval = 0;
|
||||||
|
int done = 0, ele_open = 0, buffer_size = 0;
|
||||||
|
|
||||||
|
int taint = OBJ_TAINTED( port );
|
||||||
|
if ( !rb_respond_to( port, s_read ) )
|
||||||
|
{
|
||||||
|
if ( rb_respond_to( port, s_to_str ) )
|
||||||
|
{
|
||||||
|
port = rb_funcall( port, s_to_str, 0 );
|
||||||
|
StringValue(port);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
rb_raise( rb_eArgError, "bad Hpricot argument, String or IO only please." );
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
buffer_size = BUFSIZE;
|
||||||
|
if (rb_ivar_defined(self, rb_intern("@buffer_size")) == Qtrue) {
|
||||||
|
bufsize = rb_ivar_get(self, rb_intern("@buffer_size"));
|
||||||
|
if (!NIL_P(bufsize)) {
|
||||||
|
buffer_size = NUM2INT(bufsize);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
buf = ALLOC_N(char, buffer_size);
|
||||||
|
|
||||||
|
%% write init;
|
||||||
|
|
||||||
|
while ( !done ) {
|
||||||
|
VALUE str;
|
||||||
|
char *p = buf + have, *pe;
|
||||||
|
int len, space = buffer_size - have;
|
||||||
|
|
||||||
|
if ( space == 0 ) {
|
||||||
|
/* We've used up the entire buffer storing an already-parsed token
|
||||||
|
* prefix that must be preserved. Likely caused by super-long attributes.
|
||||||
|
* See ticket #13. */
|
||||||
|
rb_raise(rb_eHpricotParseError, "ran out of buffer space on element <%s>, starting on line %d.", RSTRING(tag)->ptr, curline);
|
||||||
|
}
|
||||||
|
|
||||||
|
if ( rb_respond_to( port, s_read ) )
|
||||||
|
{
|
||||||
|
str = rb_funcall( port, s_read, 1, INT2FIX(space) );
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
str = rb_str_substr( port, nread, space );
|
||||||
|
}
|
||||||
|
|
||||||
|
StringValue(str);
|
||||||
|
memcpy( p, RSTRING(str)->ptr, RSTRING(str)->len );
|
||||||
|
len = RSTRING(str)->len;
|
||||||
|
nread += len;
|
||||||
|
|
||||||
|
/* If this is the last buffer, tack on an EOF. */
|
||||||
|
if ( len < space ) {
|
||||||
|
p[len++] = 0;
|
||||||
|
done = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
pe = p + len;
|
||||||
|
%% write exec;
|
||||||
|
|
||||||
|
if ( cs == hpricot_scan_error ) {
|
||||||
|
free(buf);
|
||||||
|
if ( !NIL_P(tag) )
|
||||||
|
{
|
||||||
|
rb_raise(rb_eHpricotParseError, "parse error on element <%s>, starting on line %d.\n" NO_WAY_SERIOUSLY, RSTRING(tag)->ptr, curline);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
rb_raise(rb_eHpricotParseError, "parse error on line %d.\n" NO_WAY_SERIOUSLY, curline);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ( done && ele_open )
|
||||||
|
{
|
||||||
|
ele_open = 0;
|
||||||
|
if (tokstart > 0) {
|
||||||
|
mark_tag = tokstart;
|
||||||
|
tokstart = 0;
|
||||||
|
text = 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ( tokstart == 0 )
|
||||||
|
{
|
||||||
|
have = 0;
|
||||||
|
/* text nodes have no tokstart because each byte is parsed alone */
|
||||||
|
if ( mark_tag != NULL && text == 1 )
|
||||||
|
{
|
||||||
|
if (done)
|
||||||
|
{
|
||||||
|
if (mark_tag < p-1)
|
||||||
|
{
|
||||||
|
CAT(tag, p-1);
|
||||||
|
ELE(text);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
CAT(tag, p);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
mark_tag = buf;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
have = pe - tokstart;
|
||||||
|
memmove( buf, tokstart, have );
|
||||||
|
SLIDE(tag);
|
||||||
|
SLIDE(akey);
|
||||||
|
SLIDE(aval);
|
||||||
|
tokend = buf + (tokend - tokstart);
|
||||||
|
tokstart = buf;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
free(buf);
|
||||||
|
}
|
||||||
|
|
||||||
|
void Init_hpricot_scan()
|
||||||
|
{
|
||||||
|
VALUE mHpricot = rb_define_module("Hpricot");
|
||||||
|
rb_define_attr(rb_singleton_class(mHpricot), "buffer_size", 1, 1);
|
||||||
|
rb_define_singleton_method(mHpricot, "scan", hpricot_scan, 1);
|
||||||
|
rb_eHpricotParseError = rb_define_class_under(mHpricot, "ParseError", rb_eException);
|
||||||
|
|
||||||
|
s_read = rb_intern("read");
|
||||||
|
s_to_str = rb_intern("to_str");
|
||||||
|
sym_xmldecl = ID2SYM(rb_intern("xmldecl"));
|
||||||
|
sym_doctype = ID2SYM(rb_intern("doctype"));
|
||||||
|
sym_procins = ID2SYM(rb_intern("procins"));
|
||||||
|
sym_stag = ID2SYM(rb_intern("stag"));
|
||||||
|
sym_etag = ID2SYM(rb_intern("etag"));
|
||||||
|
sym_emptytag = ID2SYM(rb_intern("emptytag"));
|
||||||
|
sym_comment = ID2SYM(rb_intern("comment"));
|
||||||
|
sym_cdata = ID2SYM(rb_intern("cdata"));
|
||||||
|
sym_text = ID2SYM(rb_intern("text"));
|
||||||
|
}
|
@ -0,0 +1,176 @@
|
|||||||
|
|
||||||
|
# This rbconfig.rb corresponds to a Ruby installation for win32 cross-compiled
|
||||||
|
# with mingw under i686-linux. It can be used to cross-compile extensions for
|
||||||
|
# win32 using said toolchain.
|
||||||
|
#
|
||||||
|
# This file assumes that a cross-compiled mingw32 build (compatible with the
|
||||||
|
# mswin32 builds) is installed under $HOME/ruby-mingw32.
|
||||||
|
|
||||||
|
module Config
|
||||||
|
#RUBY_VERSION == "1.8.5" or
|
||||||
|
# raise "ruby lib version (1.8.5) doesn't match executable version (#{RUBY_VERSION})"
|
||||||
|
|
||||||
|
mingw32 = ENV['MINGW32_RUBY'] || "#{ENV["HOME"]}/ruby-mingw32"
|
||||||
|
mingwpre = ENV['MINGW32_PREFIX']
|
||||||
|
TOPDIR = File.dirname(__FILE__).chomp!("/lib/ruby/1.8/i386-mingw32")
|
||||||
|
DESTDIR = '' unless defined? DESTDIR
|
||||||
|
CONFIG = {}
|
||||||
|
CONFIG["DESTDIR"] = DESTDIR
|
||||||
|
CONFIG["INSTALL"] = "/usr/bin/install -c"
|
||||||
|
CONFIG["prefix"] = (TOPDIR || DESTDIR + mingw32)
|
||||||
|
CONFIG["EXEEXT"] = ".exe"
|
||||||
|
CONFIG["ruby_install_name"] = "ruby"
|
||||||
|
CONFIG["RUBY_INSTALL_NAME"] = "ruby"
|
||||||
|
CONFIG["RUBY_SO_NAME"] = "msvcrt-ruby18"
|
||||||
|
CONFIG["SHELL"] = "/bin/sh"
|
||||||
|
CONFIG["PATH_SEPARATOR"] = ":"
|
||||||
|
CONFIG["PACKAGE_NAME"] = ""
|
||||||
|
CONFIG["PACKAGE_TARNAME"] = ""
|
||||||
|
CONFIG["PACKAGE_VERSION"] = ""
|
||||||
|
CONFIG["PACKAGE_STRING"] = ""
|
||||||
|
CONFIG["PACKAGE_BUGREPORT"] = ""
|
||||||
|
CONFIG["exec_prefix"] = "$(prefix)"
|
||||||
|
CONFIG["bindir"] = "$(exec_prefix)/bin"
|
||||||
|
CONFIG["sbindir"] = "$(exec_prefix)/sbin"
|
||||||
|
CONFIG["libexecdir"] = "$(exec_prefix)/libexec"
|
||||||
|
CONFIG["datadir"] = "$(prefix)/share"
|
||||||
|
CONFIG["sysconfdir"] = "$(prefix)/etc"
|
||||||
|
CONFIG["sharedstatedir"] = "$(prefix)/com"
|
||||||
|
CONFIG["localstatedir"] = "$(prefix)/var"
|
||||||
|
CONFIG["libdir"] = "$(exec_prefix)/lib"
|
||||||
|
CONFIG["includedir"] = "$(prefix)/include"
|
||||||
|
CONFIG["oldincludedir"] = "/usr/include"
|
||||||
|
CONFIG["infodir"] = "$(prefix)/info"
|
||||||
|
CONFIG["mandir"] = "$(prefix)/man"
|
||||||
|
CONFIG["build_alias"] = "i686-linux"
|
||||||
|
CONFIG["host_alias"] = "#{mingwpre}"
|
||||||
|
CONFIG["target_alias"] = "i386-mingw32"
|
||||||
|
CONFIG["ECHO_C"] = ""
|
||||||
|
CONFIG["ECHO_N"] = "-n"
|
||||||
|
CONFIG["ECHO_T"] = ""
|
||||||
|
CONFIG["LIBS"] = "-lwsock32 "
|
||||||
|
CONFIG["MAJOR"] = "1"
|
||||||
|
CONFIG["MINOR"] = "8"
|
||||||
|
CONFIG["TEENY"] = "4"
|
||||||
|
CONFIG["build"] = "i686-pc-linux"
|
||||||
|
CONFIG["build_cpu"] = "i686"
|
||||||
|
CONFIG["build_vendor"] = "pc"
|
||||||
|
CONFIG["build_os"] = "linux"
|
||||||
|
CONFIG["host"] = "i586-pc-mingw32msvc"
|
||||||
|
CONFIG["host_cpu"] = "i586"
|
||||||
|
CONFIG["host_vendor"] = "pc"
|
||||||
|
CONFIG["host_os"] = "mingw32msvc"
|
||||||
|
CONFIG["target"] = "i386-pc-mingw32"
|
||||||
|
CONFIG["target_cpu"] = "i386"
|
||||||
|
CONFIG["target_vendor"] = "pc"
|
||||||
|
CONFIG["target_os"] = "mingw32"
|
||||||
|
CONFIG["CC"] = "#{mingwpre}-gcc"
|
||||||
|
CONFIG["CFLAGS"] = "-g -O2 "
|
||||||
|
CONFIG["LDFLAGS"] = ""
|
||||||
|
CONFIG["CPPFLAGS"] = ""
|
||||||
|
CONFIG["OBJEXT"] = "o"
|
||||||
|
CONFIG["CPP"] = "#{mingwpre}-gcc -E"
|
||||||
|
CONFIG["EGREP"] = "grep -E"
|
||||||
|
CONFIG["GNU_LD"] = "yes"
|
||||||
|
CONFIG["CPPOUTFILE"] = "-o conftest.i"
|
||||||
|
CONFIG["OUTFLAG"] = "-o "
|
||||||
|
CONFIG["YACC"] = "bison -y"
|
||||||
|
CONFIG["RANLIB"] = "#{mingwpre}-ranlib"
|
||||||
|
CONFIG["AR"] = "#{mingwpre}-ar"
|
||||||
|
CONFIG["NM"] = "#{mingwpre}-nm"
|
||||||
|
CONFIG["WINDRES"] = "#{mingwpre}-windres"
|
||||||
|
CONFIG["DLLWRAP"] = "#{mingwpre}-dllwrap"
|
||||||
|
CONFIG["OBJDUMP"] = "#{mingwpre}-objdump"
|
||||||
|
CONFIG["LN_S"] = "ln -s"
|
||||||
|
CONFIG["SET_MAKE"] = ""
|
||||||
|
CONFIG["INSTALL_PROGRAM"] = "$(INSTALL)"
|
||||||
|
CONFIG["INSTALL_SCRIPT"] = "$(INSTALL)"
|
||||||
|
CONFIG["INSTALL_DATA"] = "$(INSTALL) -m 644"
|
||||||
|
CONFIG["RM"] = "rm -f"
|
||||||
|
CONFIG["CP"] = "cp"
|
||||||
|
CONFIG["MAKEDIRS"] = "mkdir -p"
|
||||||
|
CONFIG["LIBOBJS"] = " fileblocks$(U).o crypt$(U).o flock$(U).o acosh$(U).o win32$(U).o"
|
||||||
|
CONFIG["ALLOCA"] = ""
|
||||||
|
CONFIG["DLDFLAGS"] = " -Wl,--enable-auto-import,--export-all"
|
||||||
|
CONFIG["ARCH_FLAG"] = ""
|
||||||
|
CONFIG["STATIC"] = ""
|
||||||
|
CONFIG["CCDLFLAGS"] = ""
|
||||||
|
CONFIG["LDSHARED"] = "#{mingwpre}-gcc -shared -s"
|
||||||
|
CONFIG["DLEXT"] = "so"
|
||||||
|
CONFIG["DLEXT2"] = "dll"
|
||||||
|
CONFIG["LIBEXT"] = "a"
|
||||||
|
CONFIG["LINK_SO"] = ""
|
||||||
|
CONFIG["LIBPATHFLAG"] = " -L\"%s\""
|
||||||
|
CONFIG["RPATHFLAG"] = ""
|
||||||
|
CONFIG["LIBPATHENV"] = ""
|
||||||
|
CONFIG["TRY_LINK"] = ""
|
||||||
|
CONFIG["STRIP"] = "strip"
|
||||||
|
CONFIG["EXTSTATIC"] = ""
|
||||||
|
CONFIG["setup"] = "Setup"
|
||||||
|
CONFIG["MINIRUBY"] = "ruby -rfake"
|
||||||
|
CONFIG["PREP"] = "fake.rb"
|
||||||
|
CONFIG["RUNRUBY"] = "$(MINIRUBY) -I`cd $(srcdir)/lib; pwd`"
|
||||||
|
CONFIG["EXTOUT"] = ".ext"
|
||||||
|
CONFIG["ARCHFILE"] = ""
|
||||||
|
CONFIG["RDOCTARGET"] = ""
|
||||||
|
CONFIG["XCFLAGS"] = " -DRUBY_EXPORT"
|
||||||
|
CONFIG["XLDFLAGS"] = " -Wl,--stack,0x02000000 -L."
|
||||||
|
CONFIG["LIBRUBY_LDSHARED"] = "#{mingwpre}-gcc -shared -s"
|
||||||
|
CONFIG["LIBRUBY_DLDFLAGS"] = " -Wl,--enable-auto-import,--export-all -Wl,--out-implib=$(LIBRUBY)"
|
||||||
|
CONFIG["rubyw_install_name"] = "rubyw"
|
||||||
|
CONFIG["RUBYW_INSTALL_NAME"] = "rubyw"
|
||||||
|
CONFIG["LIBRUBY_A"] = "lib$(RUBY_SO_NAME)-static.a"
|
||||||
|
CONFIG["LIBRUBY_SO"] = "$(RUBY_SO_NAME).dll"
|
||||||
|
CONFIG["LIBRUBY_ALIASES"] = ""
|
||||||
|
CONFIG["LIBRUBY"] = "lib$(LIBRUBY_SO).a"
|
||||||
|
CONFIG["LIBRUBYARG"] = "$(LIBRUBYARG_SHARED)"
|
||||||
|
CONFIG["LIBRUBYARG_STATIC"] = "-l$(RUBY_SO_NAME)-static"
|
||||||
|
CONFIG["LIBRUBYARG_SHARED"] = "-l$(RUBY_SO_NAME)"
|
||||||
|
CONFIG["SOLIBS"] = "$(LIBS)"
|
||||||
|
CONFIG["DLDLIBS"] = ""
|
||||||
|
CONFIG["ENABLE_SHARED"] = "yes"
|
||||||
|
CONFIG["MAINLIBS"] = ""
|
||||||
|
CONFIG["COMMON_LIBS"] = "m"
|
||||||
|
CONFIG["COMMON_MACROS"] = ""
|
||||||
|
CONFIG["COMMON_HEADERS"] = "windows.h winsock.h"
|
||||||
|
CONFIG["EXPORT_PREFIX"] = ""
|
||||||
|
CONFIG["MINIOBJS"] = "dmydln.o"
|
||||||
|
CONFIG["MAKEFILES"] = "Makefile GNUmakefile"
|
||||||
|
CONFIG["arch"] = "i386-mingw32"
|
||||||
|
CONFIG["sitearch"] = "i386-msvcrt"
|
||||||
|
CONFIG["sitedir"] = "$(prefix)/lib/ruby/site_ruby"
|
||||||
|
CONFIG["configure_args"] = "'--host=#{mingwpre}' '--target=i386-mingw32' '--build=i686-linux' '--prefix=#{mingw32}' 'build_alias=i686-linux' 'host_alias=#{mingwpre}' 'target_alias=i386-mingw32'"
|
||||||
|
CONFIG["NROFF"] = "/usr/bin/nroff"
|
||||||
|
CONFIG["MANTYPE"] = "doc"
|
||||||
|
CONFIG["LTLIBOBJS"] = " fileblocks$(U).lo crypt$(U).lo flock$(U).lo acosh$(U).lo win32$(U).lo"
|
||||||
|
CONFIG["ruby_version"] = "$(MAJOR).$(MINOR)"
|
||||||
|
CONFIG["rubylibdir"] = "$(libdir)/ruby/$(ruby_version)"
|
||||||
|
CONFIG["archdir"] = "$(rubylibdir)/$(arch)"
|
||||||
|
CONFIG["sitelibdir"] = "$(sitedir)/$(ruby_version)"
|
||||||
|
CONFIG["sitearchdir"] = "$(sitelibdir)/$(sitearch)"
|
||||||
|
CONFIG["topdir"] = File.dirname(__FILE__)
|
||||||
|
MAKEFILE_CONFIG = {}
|
||||||
|
CONFIG.each{|k,v| MAKEFILE_CONFIG[k] = v.dup}
|
||||||
|
def Config::expand(val, config = CONFIG)
|
||||||
|
val.gsub!(/\$\$|\$\(([^()]+)\)|\$\{([^{}]+)\}/) do |var|
|
||||||
|
if !(v = $1 || $2)
|
||||||
|
'$'
|
||||||
|
elsif key = config[v = v[/\A[^:]+(?=(?::(.*?)=(.*))?\z)/]]
|
||||||
|
pat, sub = $1, $2
|
||||||
|
config[v] = false
|
||||||
|
Config::expand(key, config)
|
||||||
|
config[v] = key
|
||||||
|
key = key.gsub(/#{Regexp.quote(pat)}(?=\s|\z)/n) {sub} if pat
|
||||||
|
key
|
||||||
|
else
|
||||||
|
var
|
||||||
|
end
|
||||||
|
end
|
||||||
|
val
|
||||||
|
end
|
||||||
|
CONFIG.each_value do |val|
|
||||||
|
Config::expand(val)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
RbConfig = Config # compatibility for ruby-1.9
|
||||||
|
CROSS_COMPILING = nil unless defined? CROSS_COMPILING
|
@ -0,0 +1,3 @@
|
|||||||
|
|
||||||
|
require File.join(File.dirname(__FILE__), 'lib', 'hpricot')
|
||||||
|
|
@ -0,0 +1,26 @@
|
|||||||
|
# == About hpricot.rb
|
||||||
|
#
|
||||||
|
# All of Hpricot's various part are loaded when you use <tt>require 'hpricot'</tt>.
|
||||||
|
#
|
||||||
|
# * hpricot_scan: the scanner (a C extension for Ruby) which turns an HTML stream into tokens.
|
||||||
|
# * hpricot/parse.rb: uses the scanner to sort through tokens and give you back a complete document object.
|
||||||
|
# * hpricot/tag.rb: sets up objects for the various types of elements in an HTML document.
|
||||||
|
# * hpricot/modules.rb: categorizes the various elements using mixins.
|
||||||
|
# * hpricot/traverse.rb: methods for searching documents.
|
||||||
|
# * hpricot/elements.rb: methods for dealing with a group of elements as an Hpricot::Elements list.
|
||||||
|
# * hpricot/inspect.rb: methods for displaying documents in a readable form.
|
||||||
|
|
||||||
|
# If available, Nikolai's UTF-8 library will ease use of utf-8 documents.
|
||||||
|
# See http://git.bitwi.se/ruby-character-encodings.git/.
|
||||||
|
begin
|
||||||
|
require 'encoding/character/utf-8'
|
||||||
|
rescue LoadError
|
||||||
|
end
|
||||||
|
|
||||||
|
require 'hpricot_scan'
|
||||||
|
require 'hpricot/tag'
|
||||||
|
require 'hpricot/modules'
|
||||||
|
require 'hpricot/traverse'
|
||||||
|
require 'hpricot/inspect'
|
||||||
|
require 'hpricot/parse'
|
||||||
|
require 'hpricot/builder'
|
@ -0,0 +1,63 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
#--
|
||||||
|
# Copyright 2004 by Jim Weirich (jim@weirichhouse.org).
|
||||||
|
# All rights reserved.
|
||||||
|
|
||||||
|
# Permission is granted for use, copying, modification, distribution,
|
||||||
|
# and distribution of modified versions of this work as long as the
|
||||||
|
# above copyright notice is included.
|
||||||
|
#++
|
||||||
|
|
||||||
|
module Hpricot
|
||||||
|
|
||||||
|
# BlankSlate provides an abstract base class with no predefined
|
||||||
|
# methods (except for <tt>\_\_send__</tt> and <tt>\_\_id__</tt>).
|
||||||
|
# BlankSlate is useful as a base class when writing classes that
|
||||||
|
# depend upon <tt>method_missing</tt> (e.g. dynamic proxies).
|
||||||
|
class BlankSlate
|
||||||
|
class << self
|
||||||
|
|
||||||
|
# Hide the method named +name+ in the BlankSlate class. Don't
|
||||||
|
# hide +instance_eval+ or any method beginning with "__".
|
||||||
|
def hide(name)
|
||||||
|
undef_method name if
|
||||||
|
instance_methods.include?(name.to_s) and
|
||||||
|
name !~ /^(__|instance_eval)/
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
instance_methods.each { |m| hide(m) }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# Since Ruby is very dynamic, methods added to the ancestors of
|
||||||
|
# BlankSlate <em>after BlankSlate is defined</em> will show up in the
|
||||||
|
# list of available BlankSlate methods. We handle this by defining a
|
||||||
|
# hook in the Object and Kernel classes that will hide any defined
|
||||||
|
module Kernel
|
||||||
|
class << self
|
||||||
|
alias_method :hpricot_slate_method_added, :method_added
|
||||||
|
|
||||||
|
# Detect method additions to Kernel and remove them in the
|
||||||
|
# BlankSlate class.
|
||||||
|
def method_added(name)
|
||||||
|
hpricot_slate_method_added(name)
|
||||||
|
return if self != Kernel
|
||||||
|
Hpricot::BlankSlate.hide(name)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class Object
|
||||||
|
class << self
|
||||||
|
alias_method :hpricot_slate_method_added, :method_added
|
||||||
|
|
||||||
|
# Detect method additions to Object and remove them in the
|
||||||
|
# BlankSlate class.
|
||||||
|
def method_added(name)
|
||||||
|
hpricot_slate_method_added(name)
|
||||||
|
return if self != Object
|
||||||
|
Hpricot::BlankSlate.hide(name)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,200 @@
|
|||||||
|
require 'hpricot/tags'
|
||||||
|
require 'hpricot/xchar'
|
||||||
|
require 'hpricot/blankslate'
|
||||||
|
|
||||||
|
module Hpricot
|
||||||
|
def self.build(ele = Doc.new, assigns = {}, &blk)
|
||||||
|
ele.extend Builder
|
||||||
|
assigns.each do |k, v|
|
||||||
|
ele.instance_variable_set("@#{k}", v)
|
||||||
|
end
|
||||||
|
ele.instance_eval &blk
|
||||||
|
ele
|
||||||
|
end
|
||||||
|
|
||||||
|
module Builder
|
||||||
|
|
||||||
|
@@default = {
|
||||||
|
:indent => 0,
|
||||||
|
:output_helpers => true,
|
||||||
|
:output_xml_instruction => true,
|
||||||
|
:output_meta_tag => true,
|
||||||
|
:auto_validation => true,
|
||||||
|
:tagset => Hpricot::XHTMLTransitional,
|
||||||
|
:root_attributes => {
|
||||||
|
:xmlns => 'http://www.w3.org/1999/xhtml', :'xml:lang' => 'en', :lang => 'en'
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
def self.set(option, value)
|
||||||
|
@@default[option] = value
|
||||||
|
end
|
||||||
|
|
||||||
|
# Write a +string+ to the HTML stream, making sure to escape it.
|
||||||
|
def text!(string)
|
||||||
|
@children << Text.new(Hpricot.xs(string))
|
||||||
|
end
|
||||||
|
|
||||||
|
# Write a +string+ to the HTML stream without escaping it.
|
||||||
|
def text(string)
|
||||||
|
@children << Text.new(string)
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
alias_method :<<, :text
|
||||||
|
alias_method :concat, :text
|
||||||
|
|
||||||
|
# Create a tag named +tag+. Other than the first argument which is the tag name,
|
||||||
|
# the arguments are the same as the tags implemented via method_missing.
|
||||||
|
def tag!(tag, *args, &block)
|
||||||
|
ele_id = nil
|
||||||
|
if @auto_validation and @tagset
|
||||||
|
if !@tagset.tagset.has_key?(tag)
|
||||||
|
raise InvalidXhtmlError, "no element `#{tag}' for #{tagset.doctype}"
|
||||||
|
elsif args.last.respond_to?(:to_hash)
|
||||||
|
attrs = args.last.to_hash
|
||||||
|
|
||||||
|
if @tagset.forms.include?(tag) and attrs[:id]
|
||||||
|
attrs[:name] ||= attrs[:id]
|
||||||
|
end
|
||||||
|
|
||||||
|
attrs.each do |k, v|
|
||||||
|
atname = k.to_s.downcase.intern
|
||||||
|
unless k =~ /:/ or @tagset.tagset[tag].include? atname
|
||||||
|
raise InvalidXhtmlError, "no attribute `#{k}' on #{tag} elements"
|
||||||
|
end
|
||||||
|
if atname == :id
|
||||||
|
ele_id = v.to_s
|
||||||
|
if @elements.has_key? ele_id
|
||||||
|
raise InvalidXhtmlError, "id `#{ele_id}' already used (id's must be unique)."
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# turn arguments into children or attributes
|
||||||
|
childs = []
|
||||||
|
attrs = args.grep(Hash)
|
||||||
|
childs.concat((args - attrs).map do |x|
|
||||||
|
if x.respond_to? :to_html
|
||||||
|
Hpricot.make(x.to_html)
|
||||||
|
elsif x
|
||||||
|
Text.new(Hpricot.xs(x))
|
||||||
|
end
|
||||||
|
end.flatten)
|
||||||
|
attrs = attrs.inject({}) do |hsh, ath|
|
||||||
|
ath.each do |k, v|
|
||||||
|
hsh[k] = Hpricot.xs(v.to_s) if v
|
||||||
|
end
|
||||||
|
hsh
|
||||||
|
end
|
||||||
|
|
||||||
|
# create the element itself
|
||||||
|
f = Elem.new(STag.new(tag, attrs), childs, ETag.new(tag))
|
||||||
|
|
||||||
|
# build children from the block
|
||||||
|
if block
|
||||||
|
build(f, &block)
|
||||||
|
end
|
||||||
|
|
||||||
|
@children << f
|
||||||
|
f
|
||||||
|
end
|
||||||
|
|
||||||
|
def build(*a, &b)
|
||||||
|
Hpricot.build(*a, &b)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Every HTML tag method goes through an html_tag call. So, calling <tt>div</tt> is equivalent
|
||||||
|
# to calling <tt>html_tag(:div)</tt>. All HTML tags in Hpricot's list are given generated wrappers
|
||||||
|
# for this method.
|
||||||
|
#
|
||||||
|
# If the @auto_validation setting is on, this method will check for many common mistakes which
|
||||||
|
# could lead to invalid XHTML.
|
||||||
|
def html_tag(sym, *args, &block)
|
||||||
|
if @auto_validation and @tagset.self_closing.include?(sym) and block
|
||||||
|
raise InvalidXhtmlError, "the `#{sym}' element is self-closing, please remove the block"
|
||||||
|
elsif args.empty? and block.nil?
|
||||||
|
CssProxy.new(self, sym)
|
||||||
|
else
|
||||||
|
tag!(sym, *args, &block)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
XHTMLTransitional.tags.each do |k|
|
||||||
|
class_eval %{
|
||||||
|
def #{k}(*args, &block)
|
||||||
|
html_tag(#{k.inspect}, *args, &block)
|
||||||
|
end
|
||||||
|
}
|
||||||
|
end
|
||||||
|
|
||||||
|
def doctype(target, pub, sys)
|
||||||
|
@children << DocType.new(target, pub, sys)
|
||||||
|
end
|
||||||
|
|
||||||
|
remove_method :head
|
||||||
|
|
||||||
|
# Builds a head tag. Adds a <tt>meta</tt> tag inside with Content-Type
|
||||||
|
# set to <tt>text/html; charset=utf-8</tt>.
|
||||||
|
def head(*args, &block)
|
||||||
|
tag!(:head, *args) do
|
||||||
|
tag!(:meta, "http-equiv" => "Content-Type", "content" => "text/html; charset=utf-8") if @output_meta_tag
|
||||||
|
instance_eval(&block)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# Builds an html tag. An XML 1.0 instruction and an XHTML 1.0 Transitional doctype
|
||||||
|
# are prepended. Also assumes <tt>:xmlns => "http://www.w3.org/1999/xhtml",
|
||||||
|
# :lang => "en"</tt>.
|
||||||
|
def xhtml_transitional(attrs = {}, &block)
|
||||||
|
# self.tagset = Hpricot::XHTMLTransitional
|
||||||
|
xhtml_html(attrs, &block)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Builds an html tag with XHTML 1.0 Strict doctype instead.
|
||||||
|
def xhtml_strict(attrs = {}, &block)
|
||||||
|
# self.tagset = Hpricot::XHTMLStrict
|
||||||
|
xhtml_html(attrs, &block)
|
||||||
|
end
|
||||||
|
|
||||||
|
private
|
||||||
|
|
||||||
|
def xhtml_html(attrs = {}, &block)
|
||||||
|
instruct! if @output_xml_instruction
|
||||||
|
doctype(:html, *@@default[:tagset].doctype)
|
||||||
|
tag!(:html, @@default[:root_attributes].merge(attrs), &block)
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
# Class used by Markaby::Builder to store element options. Methods called
|
||||||
|
# against the CssProxy object are added as element classes or IDs.
|
||||||
|
#
|
||||||
|
# See the README for examples.
|
||||||
|
class CssProxy < BlankSlate
|
||||||
|
|
||||||
|
# Creates a CssProxy object.
|
||||||
|
def initialize(builder, sym)
|
||||||
|
@builder, @sym, @attrs = builder, sym, {}
|
||||||
|
end
|
||||||
|
|
||||||
|
# Adds attributes to an element. Bang methods set the :id attribute.
|
||||||
|
# Other methods add to the :class attribute.
|
||||||
|
def method_missing(id_or_class, *args, &block)
|
||||||
|
if (idc = id_or_class.to_s) =~ /!$/
|
||||||
|
@attrs[:id] = $`
|
||||||
|
else
|
||||||
|
@attrs[:class] = @attrs[:class].nil? ? idc : "#{@attrs[:class]} #{idc}".strip
|
||||||
|
end
|
||||||
|
|
||||||
|
if block or args.any?
|
||||||
|
args.push(@attrs)
|
||||||
|
return @builder.tag!(@sym, *args, &block)
|
||||||
|
end
|
||||||
|
|
||||||
|
return self
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,510 @@
|
|||||||
|
module Hpricot
|
||||||
|
# Once you've matched a list of elements, you will often need to handle them as
|
||||||
|
# a group. Or you may want to perform the same action on each of them.
|
||||||
|
# Hpricot::Elements is an extension of Ruby's array class, with some methods
|
||||||
|
# added for altering elements contained in the array.
|
||||||
|
#
|
||||||
|
# If you need to create an element array from regular elements:
|
||||||
|
#
|
||||||
|
# Hpricot::Elements[ele1, ele2, ele3]
|
||||||
|
#
|
||||||
|
# Assuming that ele1, ele2 and ele3 contain element objects (Hpricot::Elem,
|
||||||
|
# Hpricot::Doc, etc.)
|
||||||
|
#
|
||||||
|
# == Continuing Searches
|
||||||
|
#
|
||||||
|
# Usually the Hpricot::Elements you're working on comes from a search you've
|
||||||
|
# done. Well, you can continue searching the list by using the same <tt>at</tt>
|
||||||
|
# and <tt>search</tt> methods you can use on plain elements.
|
||||||
|
#
|
||||||
|
# elements = doc.search("/div/p")
|
||||||
|
# elements = elements.search("/a[@href='http://hoodwink.d/']")
|
||||||
|
# elements = elements.at("img")
|
||||||
|
#
|
||||||
|
# == Altering Elements
|
||||||
|
#
|
||||||
|
# When you're altering elements in the list, your changes will be reflected in
|
||||||
|
# the document you started searching from.
|
||||||
|
#
|
||||||
|
# doc = Hpricot("That's my <b>spoon</b>, Tyler.")
|
||||||
|
# doc.at("b").swap("<i>fork</i>")
|
||||||
|
# doc.to_html
|
||||||
|
# #=> "That's my <i>fork</i>, Tyler."
|
||||||
|
#
|
||||||
|
# == Getting More Detailed
|
||||||
|
#
|
||||||
|
# If you can't find a method here that does what you need, you may need to
|
||||||
|
# loop through the elements and find a method in Hpricot::Container::Trav
|
||||||
|
# which can do what you need.
|
||||||
|
#
|
||||||
|
# For example, you may want to search for all the H3 header tags in a document
|
||||||
|
# and grab all the tags underneath the header, but not inside the header.
|
||||||
|
# A good method for this is <tt>next_sibling</tt>:
|
||||||
|
#
|
||||||
|
# doc.search("h3").each do |h3|
|
||||||
|
# while ele = h3.next_sibling
|
||||||
|
# ary << ele # stuff away all the elements under the h3
|
||||||
|
# end
|
||||||
|
# end
|
||||||
|
#
|
||||||
|
# Most of the useful element methods are in the mixins Hpricot::Traverse
|
||||||
|
# and Hpricot::Container::Trav.
|
||||||
|
class Elements < Array
|
||||||
|
|
||||||
|
# Searches this list for any elements (or children of these elements) matching
|
||||||
|
# the CSS or XPath expression +expr+. Root is assumed to be the element scanned.
|
||||||
|
#
|
||||||
|
# See Hpricot::Container::Trav.search for more.
|
||||||
|
def search(*expr,&blk)
|
||||||
|
Elements[*map { |x| x.search(*expr,&blk) }.flatten.uniq]
|
||||||
|
end
|
||||||
|
alias_method :/, :search
|
||||||
|
|
||||||
|
# Searches this list for the first element (or child of these elements) matching
|
||||||
|
# the CSS or XPath expression +expr+. Root is assumed to be the element scanned.
|
||||||
|
#
|
||||||
|
# See Hpricot::Container::Trav.at for more.
|
||||||
|
def at(expr, &blk)
|
||||||
|
search(expr, &blk).first
|
||||||
|
end
|
||||||
|
alias_method :%, :at
|
||||||
|
|
||||||
|
# Convert this group of elements into a complete HTML fragment, returned as a
|
||||||
|
# string.
|
||||||
|
def to_html
|
||||||
|
map { |x| x.output("") }.join
|
||||||
|
end
|
||||||
|
alias_method :to_s, :to_html
|
||||||
|
|
||||||
|
# Returns an HTML fragment built of the contents of each element in this list.
|
||||||
|
#
|
||||||
|
# If a HTML +string+ is supplied, this method acts like inner_html=.
|
||||||
|
def inner_html(*string)
|
||||||
|
if string.empty?
|
||||||
|
map { |x| x.inner_html }.join
|
||||||
|
else
|
||||||
|
x = self.inner_html = string.pop || x
|
||||||
|
end
|
||||||
|
end
|
||||||
|
alias_method :html, :inner_html
|
||||||
|
alias_method :innerHTML, :inner_html
|
||||||
|
|
||||||
|
# Replaces the contents of each element in this list. Supply an HTML +string+,
|
||||||
|
# which is loaded into Hpricot objects and inserted into every element in this
|
||||||
|
# list.
|
||||||
|
def inner_html=(string)
|
||||||
|
each { |x| x.inner_html = string }
|
||||||
|
end
|
||||||
|
alias_method :html=, :inner_html=
|
||||||
|
alias_method :innerHTML=, :inner_html=
|
||||||
|
|
||||||
|
# Returns an string containing the text contents of each element in this list.
|
||||||
|
# All HTML tags are removed.
|
||||||
|
def inner_text
|
||||||
|
map { |x| x.inner_text }.join
|
||||||
|
end
|
||||||
|
alias_method :text, :inner_text
|
||||||
|
|
||||||
|
# Remove all elements in this list from the document which contains them.
|
||||||
|
#
|
||||||
|
# doc = Hpricot("<html>Remove this: <b>here</b></html>")
|
||||||
|
# doc.search("b").remove
|
||||||
|
# doc.to_html
|
||||||
|
# => "<html>Remove this: </html>"
|
||||||
|
#
|
||||||
|
def remove
|
||||||
|
each { |x| x.parent.children.delete(x) }
|
||||||
|
end
|
||||||
|
|
||||||
|
# Empty the elements in this list, by removing their insides.
|
||||||
|
#
|
||||||
|
# doc = Hpricot("<p> We have <i>so much</i> to say.</p>")
|
||||||
|
# doc.search("i").empty
|
||||||
|
# doc.to_html
|
||||||
|
# => "<p> We have <i></i> to say.</p>"
|
||||||
|
#
|
||||||
|
def empty
|
||||||
|
each { |x| x.inner_html = nil }
|
||||||
|
end
|
||||||
|
|
||||||
|
# Add to the end of the contents inside each element in this list.
|
||||||
|
# Pass in an HTML +str+, which is turned into Hpricot elements.
|
||||||
|
def append(str = nil, &blk)
|
||||||
|
each { |x| x.html(x.children + Hpricot.make(str, &blk)) }
|
||||||
|
end
|
||||||
|
|
||||||
|
# Add to the start of the contents inside each element in this list.
|
||||||
|
# Pass in an HTML +str+, which is turned into Hpricot elements.
|
||||||
|
def prepend(str = nil, &blk)
|
||||||
|
each { |x| x.html(Hpricot.make(str, &blk) + x.children) }
|
||||||
|
end
|
||||||
|
|
||||||
|
# Add some HTML just previous to each element in this list.
|
||||||
|
# Pass in an HTML +str+, which is turned into Hpricot elements.
|
||||||
|
def before(str = nil, &blk)
|
||||||
|
each { |x| x.parent.insert_before Hpricot.make(str, &blk), x }
|
||||||
|
end
|
||||||
|
|
||||||
|
# Just after each element in this list, add some HTML.
|
||||||
|
# Pass in an HTML +str+, which is turned into Hpricot elements.
|
||||||
|
def after(str = nil, &blk)
|
||||||
|
each { |x| x.parent.insert_after Hpricot.make(str, &blk), x }
|
||||||
|
end
|
||||||
|
|
||||||
|
# Wraps each element in the list inside the element created by HTML +str+.
|
||||||
|
# If more than one element is found in the string, Hpricot locates the
|
||||||
|
# deepest spot inside the first element.
|
||||||
|
#
|
||||||
|
# doc.search("a[@href]").
|
||||||
|
# wrap(%{<div class="link"><div class="link_inner"></div></div>})
|
||||||
|
#
|
||||||
|
# This code wraps every link on the page inside a +div.link+ and a +div.link_inner+ nest.
|
||||||
|
def wrap(str = nil, &blk)
|
||||||
|
each do |x|
|
||||||
|
wrap = Hpricot.make(str, &blk)
|
||||||
|
nest = wrap.detect { |w| w.respond_to? :children }
|
||||||
|
unless nest
|
||||||
|
raise Exception, "No wrapping element found."
|
||||||
|
end
|
||||||
|
x.parent.replace_child(x, wrap)
|
||||||
|
nest = nest.children.first until nest.empty?
|
||||||
|
nest.html(nest.children + [x])
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# Gets and sets attributes on all matched elements.
|
||||||
|
#
|
||||||
|
# Pass in a +key+ on its own and this method will return the string value
|
||||||
|
# assigned to that attribute for the first elements. Or +nil+ if the
|
||||||
|
# attribute isn't found.
|
||||||
|
#
|
||||||
|
# doc.search("a").attr("href")
|
||||||
|
# #=> "http://hacketyhack.net/"
|
||||||
|
#
|
||||||
|
# Or, pass in a +key+ and +value+. This will set an attribute for all
|
||||||
|
# matched elements.
|
||||||
|
#
|
||||||
|
# doc.search("p").attr("class", "basic")
|
||||||
|
#
|
||||||
|
# You may also use a Hash to set a series of attributes:
|
||||||
|
#
|
||||||
|
# (doc/"a").attr(:class => "basic", :href => "http://hackety.org/")
|
||||||
|
#
|
||||||
|
# Lastly, a block can be used to rewrite an attribute based on the element
|
||||||
|
# it belongs to. The block will pass in an element. Return from the block
|
||||||
|
# the new value of the attribute.
|
||||||
|
#
|
||||||
|
# records.attr("href") { |e| e['href'] + "#top" }
|
||||||
|
#
|
||||||
|
# This example adds a <tt>#top</tt> anchor to each link.
|
||||||
|
#
|
||||||
|
def attr key, value = nil, &blk
|
||||||
|
if value or blk
|
||||||
|
each do |el|
|
||||||
|
el.set_attribute(key, value || blk[el])
|
||||||
|
end
|
||||||
|
return self
|
||||||
|
end
|
||||||
|
if key.is_a? Hash
|
||||||
|
key.each { |k,v| self.attr(k,v) }
|
||||||
|
return self
|
||||||
|
else
|
||||||
|
return self[0].get_attribute(key)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
alias_method :set, :attr
|
||||||
|
|
||||||
|
# Adds the class to all matched elements.
|
||||||
|
#
|
||||||
|
# (doc/"p").add_class("bacon")
|
||||||
|
#
|
||||||
|
# Now all paragraphs will have class="bacon".
|
||||||
|
def add_class class_name
|
||||||
|
each do |el|
|
||||||
|
next unless el.respond_to? :get_attribute
|
||||||
|
classes = el.get_attribute('class').to_s.split(" ")
|
||||||
|
el.set_attribute('class', classes.push(class_name).uniq.join(" "))
|
||||||
|
end
|
||||||
|
self
|
||||||
|
end
|
||||||
|
|
||||||
|
# Remove an attribute from each of the matched elements.
|
||||||
|
#
|
||||||
|
# (doc/"input").remove_attr("disabled")
|
||||||
|
#
|
||||||
|
def remove_attr name
|
||||||
|
each do |el|
|
||||||
|
next unless el.respond_to? :remove_attribute
|
||||||
|
el.remove_attribute(name)
|
||||||
|
end
|
||||||
|
self
|
||||||
|
end
|
||||||
|
|
||||||
|
# Removes a class from all matched elements.
|
||||||
|
#
|
||||||
|
# (doc/"span").remove_class("lightgrey")
|
||||||
|
#
|
||||||
|
# Or, to remove all classes:
|
||||||
|
#
|
||||||
|
# (doc/"span").remove_class
|
||||||
|
#
|
||||||
|
def remove_class name = nil
|
||||||
|
each do |el|
|
||||||
|
next unless el.respond_to? :get_attribute
|
||||||
|
if name
|
||||||
|
classes = el.get_attribute('class').to_s.split(" ")
|
||||||
|
el.set_attribute('class', (classes - [name]).uniq.join(" "))
|
||||||
|
else
|
||||||
|
el.remove_attribute("class")
|
||||||
|
end
|
||||||
|
end
|
||||||
|
self
|
||||||
|
end
|
||||||
|
|
||||||
|
ATTR_RE = %r!\[ *(?:(@)([\w\(\)-]+)|([\w\(\)-]+\(\))) *([~\!\|\*$\^=]*) *'?"?([^\]'"]*)'?"? *\]!i
|
||||||
|
BRACK_RE = %r!(\[) *([^\]]*) *\]+!i
|
||||||
|
FUNC_RE = %r!(:)?([a-zA-Z0-9\*_-]*)\( *[\"']?([^ \)]*?)['\"]? *\)!
|
||||||
|
CUST_RE = %r!(:)([a-zA-Z0-9\*_-]*)()!
|
||||||
|
CATCH_RE = %r!([:\.#]*)([a-zA-Z0-9\*_-]+)!
|
||||||
|
|
||||||
|
def self.filter(nodes, expr, truth = true)
|
||||||
|
until expr.empty?
|
||||||
|
_, *m = *expr.match(/^(?:#{ATTR_RE}|#{BRACK_RE}|#{FUNC_RE}|#{CUST_RE}|#{CATCH_RE})/)
|
||||||
|
break unless _
|
||||||
|
|
||||||
|
expr = $'
|
||||||
|
m.compact!
|
||||||
|
if m[0] == '@'
|
||||||
|
m[0] = "@#{m.slice!(2,1)}"
|
||||||
|
end
|
||||||
|
|
||||||
|
if m[0] == '[' && m[1] =~ /^\d+$/
|
||||||
|
m = [":", "nth", m[1].to_i-1]
|
||||||
|
end
|
||||||
|
|
||||||
|
if m[0] == ":" && m[1] == "not"
|
||||||
|
nodes, = Elements.filter(nodes, m[2], false)
|
||||||
|
elsif "#{m[0]}#{m[1]}" =~ /^(:even|:odd)$/
|
||||||
|
new_nodes = []
|
||||||
|
nodes.each_with_index {|n,i| new_nodes.push(n) if (i % 2 == (m[1] == "even" ? 0 : 1)) }
|
||||||
|
nodes = new_nodes
|
||||||
|
elsif "#{m[0]}#{m[1]}" =~ /^(:first|:last)$/
|
||||||
|
nodes = [nodes.send(m[1])]
|
||||||
|
else
|
||||||
|
meth = "filter[#{m[0]}#{m[1]}]" unless m[0].empty?
|
||||||
|
if meth and Traverse.method_defined? meth
|
||||||
|
args = m[2..-1]
|
||||||
|
else
|
||||||
|
meth = "filter[#{m[0]}]"
|
||||||
|
if Traverse.method_defined? meth
|
||||||
|
args = m[1..-1]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
i = -1
|
||||||
|
nodes = Elements[*nodes.find_all do |x|
|
||||||
|
i += 1
|
||||||
|
x.send(meth, *([*args] + [i])) ? truth : !truth
|
||||||
|
end]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
[nodes, expr]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Given two elements, attempt to gather an Elements array of everything between
|
||||||
|
# (and including) those two elements.
|
||||||
|
def self.expand(ele1, ele2, excl=false)
|
||||||
|
ary = []
|
||||||
|
offset = excl ? -1 : 0
|
||||||
|
|
||||||
|
if ele1 and ele2
|
||||||
|
# let's quickly take care of siblings
|
||||||
|
if ele1.parent == ele2.parent
|
||||||
|
ary = ele1.parent.children[ele1.node_position..(ele2.node_position+offset)]
|
||||||
|
else
|
||||||
|
# find common parent
|
||||||
|
p, ele1_p = ele1, [ele1]
|
||||||
|
ele1_p.unshift p while p.respond_to?(:parent) and p = p.parent
|
||||||
|
p, ele2_p = ele2, [ele2]
|
||||||
|
ele2_p.unshift p while p.respond_to?(:parent) and p = p.parent
|
||||||
|
common_parent = ele1_p.zip(ele2_p).select { |p1, p2| p1 == p2 }.flatten.last
|
||||||
|
|
||||||
|
child = nil
|
||||||
|
if ele1 == common_parent
|
||||||
|
child = ele2
|
||||||
|
elsif ele2 == common_parent
|
||||||
|
child = ele1
|
||||||
|
end
|
||||||
|
|
||||||
|
if child
|
||||||
|
ary = common_parent.children[0..(child.node_position+offset)]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return Elements[*ary]
|
||||||
|
end
|
||||||
|
|
||||||
|
def filter(expr)
|
||||||
|
nodes, = Elements.filter(self, expr)
|
||||||
|
nodes
|
||||||
|
end
|
||||||
|
|
||||||
|
def not(expr)
|
||||||
|
if expr.is_a? Traverse
|
||||||
|
nodes = self - [expr]
|
||||||
|
else
|
||||||
|
nodes, = Elements.filter(self, expr, false)
|
||||||
|
end
|
||||||
|
nodes
|
||||||
|
end
|
||||||
|
|
||||||
|
private
|
||||||
|
def copy_node(node, l)
|
||||||
|
l.instance_variables.each do |iv|
|
||||||
|
node.instance_variable_set(iv, l.instance_variable_get(iv))
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
module Traverse
|
||||||
|
def self.filter(tok, &blk)
|
||||||
|
define_method("filter[#{tok.is_a?(String) ? tok : tok.inspect}]", &blk)
|
||||||
|
end
|
||||||
|
|
||||||
|
filter '' do |name,i|
|
||||||
|
name == '*' || (self.respond_to?(:name) && self.name.downcase == name.downcase)
|
||||||
|
end
|
||||||
|
|
||||||
|
filter '#' do |id,i|
|
||||||
|
self.elem? and get_attribute('id').to_s == id
|
||||||
|
end
|
||||||
|
|
||||||
|
filter '.' do |name,i|
|
||||||
|
self.elem? and classes.include? name
|
||||||
|
end
|
||||||
|
|
||||||
|
filter :lt do |num,i|
|
||||||
|
self.position < num.to_i
|
||||||
|
end
|
||||||
|
|
||||||
|
filter :gt do |num,i|
|
||||||
|
self.position > num.to_i
|
||||||
|
end
|
||||||
|
|
||||||
|
nth = proc { |num,i| self.position == num.to_i }
|
||||||
|
nth_first = proc { |*a| self.position == 0 }
|
||||||
|
nth_last = proc { |*a| self == parent.children_of_type(self.name).last }
|
||||||
|
|
||||||
|
filter :nth, &nth
|
||||||
|
filter :eq, &nth
|
||||||
|
filter ":nth-of-type", &nth
|
||||||
|
|
||||||
|
filter :first, &nth_first
|
||||||
|
filter ":first-of-type", &nth_first
|
||||||
|
|
||||||
|
filter :last, &nth_last
|
||||||
|
filter ":last-of-type", &nth_last
|
||||||
|
|
||||||
|
filter :even do |num,i|
|
||||||
|
self.position % 2 == 0
|
||||||
|
end
|
||||||
|
|
||||||
|
filter :odd do |num,i|
|
||||||
|
self.position % 2 == 1
|
||||||
|
end
|
||||||
|
|
||||||
|
filter ':first-child' do |i|
|
||||||
|
self == parent.containers.first
|
||||||
|
end
|
||||||
|
|
||||||
|
filter ':nth-child' do |arg,i|
|
||||||
|
case arg
|
||||||
|
when 'even'; (parent.containers.index(self) + 1) % 2 == 0
|
||||||
|
when 'odd'; (parent.containers.index(self) + 1) % 2 == 1
|
||||||
|
else self == (parent.containers[arg.to_i + 1])
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
filter ":last-child" do |i|
|
||||||
|
self == parent.containers.last
|
||||||
|
end
|
||||||
|
|
||||||
|
filter ":nth-last-child" do |arg,i|
|
||||||
|
self == parent.containers[-1-arg.to_i]
|
||||||
|
end
|
||||||
|
|
||||||
|
filter ":nth-last-of-type" do |arg,i|
|
||||||
|
self == parent.children_of_type(self.name)[-1-arg.to_i]
|
||||||
|
end
|
||||||
|
|
||||||
|
filter ":only-of-type" do |arg,i|
|
||||||
|
parent.children_of_type(self.name).length == 1
|
||||||
|
end
|
||||||
|
|
||||||
|
filter ":only-child" do |arg,i|
|
||||||
|
parent.containers.length == 1
|
||||||
|
end
|
||||||
|
|
||||||
|
filter :parent do
|
||||||
|
containers.length > 0
|
||||||
|
end
|
||||||
|
|
||||||
|
filter :empty do
|
||||||
|
containers.length == 0
|
||||||
|
end
|
||||||
|
|
||||||
|
filter :root do
|
||||||
|
self.is_a? Hpricot::Doc
|
||||||
|
end
|
||||||
|
|
||||||
|
filter 'text' do
|
||||||
|
self.text?
|
||||||
|
end
|
||||||
|
|
||||||
|
filter 'comment' do
|
||||||
|
self.comment?
|
||||||
|
end
|
||||||
|
|
||||||
|
filter :contains do |arg, ignore|
|
||||||
|
html.include? arg
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
pred_procs =
|
||||||
|
{'text()' => proc { |ele, *_| ele.inner_text.strip },
|
||||||
|
'@' => proc { |ele, attr, *_| ele.get_attribute(attr).to_s if ele.elem? }}
|
||||||
|
|
||||||
|
oper_procs =
|
||||||
|
{'=' => proc { |a,b| a == b },
|
||||||
|
'!=' => proc { |a,b| a != b },
|
||||||
|
'~=' => proc { |a,b| a.split(/\s+/).include?(b) },
|
||||||
|
'|=' => proc { |a,b| a =~ /^#{Regexp::quote b}(-|$)/ },
|
||||||
|
'^=' => proc { |a,b| a.index(b) == 0 },
|
||||||
|
'$=' => proc { |a,b| a =~ /#{Regexp::quote b}$/ },
|
||||||
|
'*=' => proc { |a,b| idx = a.index(b) }}
|
||||||
|
|
||||||
|
pred_procs.each do |pred_n, pred_f|
|
||||||
|
oper_procs.each do |oper_n, oper_f|
|
||||||
|
filter "#{pred_n}#{oper_n}" do |*a|
|
||||||
|
qual = pred_f[self, *a]
|
||||||
|
oper_f[qual, a[-2]] if qual
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
filter 'text()' do |val,i|
|
||||||
|
!self.inner_text.strip.empty?
|
||||||
|
end
|
||||||
|
|
||||||
|
filter '@' do |attr,val,i|
|
||||||
|
self.elem? and has_attribute? attr
|
||||||
|
end
|
||||||
|
|
||||||
|
filter '[' do |val,i|
|
||||||
|
self.elem? and search(val).length > 0
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,672 @@
|
|||||||
|
module Hpricot
|
||||||
|
# The code below is auto-generated. Don't edit manually.
|
||||||
|
# :stopdoc:
|
||||||
|
NamedCharacters =
|
||||||
|
{"AElig"=>198, "Aacute"=>193, "Acirc"=>194, "Agrave"=>192, "Alpha"=>913,
|
||||||
|
"Aring"=>197, "Atilde"=>195, "Auml"=>196, "Beta"=>914, "Ccedil"=>199,
|
||||||
|
"Chi"=>935, "Dagger"=>8225, "Delta"=>916, "ETH"=>208, "Eacute"=>201,
|
||||||
|
"Ecirc"=>202, "Egrave"=>200, "Epsilon"=>917, "Eta"=>919, "Euml"=>203,
|
||||||
|
"Gamma"=>915, "Iacute"=>205, "Icirc"=>206, "Igrave"=>204, "Iota"=>921,
|
||||||
|
"Iuml"=>207, "Kappa"=>922, "Lambda"=>923, "Mu"=>924, "Ntilde"=>209, "Nu"=>925,
|
||||||
|
"OElig"=>338, "Oacute"=>211, "Ocirc"=>212, "Ograve"=>210, "Omega"=>937,
|
||||||
|
"Omicron"=>927, "Oslash"=>216, "Otilde"=>213, "Ouml"=>214, "Phi"=>934,
|
||||||
|
"Pi"=>928, "Prime"=>8243, "Psi"=>936, "Rho"=>929, "Scaron"=>352, "Sigma"=>931,
|
||||||
|
"THORN"=>222, "Tau"=>932, "Theta"=>920, "Uacute"=>218, "Ucirc"=>219,
|
||||||
|
"Ugrave"=>217, "Upsilon"=>933, "Uuml"=>220, "Xi"=>926, "Yacute"=>221,
|
||||||
|
"Yuml"=>376, "Zeta"=>918, "aacute"=>225, "acirc"=>226, "acute"=>180,
|
||||||
|
"aelig"=>230, "agrave"=>224, "alefsym"=>8501, "alpha"=>945, "amp"=>38,
|
||||||
|
"and"=>8743, "ang"=>8736, "apos"=>39, "aring"=>229, "asymp"=>8776,
|
||||||
|
"atilde"=>227, "auml"=>228, "bdquo"=>8222, "beta"=>946, "brvbar"=>166,
|
||||||
|
"bull"=>8226, "cap"=>8745, "ccedil"=>231, "cedil"=>184, "cent"=>162,
|
||||||
|
"chi"=>967, "circ"=>710, "clubs"=>9827, "cong"=>8773, "copy"=>169,
|
||||||
|
"crarr"=>8629, "cup"=>8746, "curren"=>164, "dArr"=>8659, "dagger"=>8224,
|
||||||
|
"darr"=>8595, "deg"=>176, "delta"=>948, "diams"=>9830, "divide"=>247,
|
||||||
|
"eacute"=>233, "ecirc"=>234, "egrave"=>232, "empty"=>8709, "emsp"=>8195,
|
||||||
|
"ensp"=>8194, "epsilon"=>949, "equiv"=>8801, "eta"=>951, "eth"=>240,
|
||||||
|
"euml"=>235, "euro"=>8364, "exist"=>8707, "fnof"=>402, "forall"=>8704,
|
||||||
|
"frac12"=>189, "frac14"=>188, "frac34"=>190, "frasl"=>8260, "gamma"=>947,
|
||||||
|
"ge"=>8805, "gt"=>62, "hArr"=>8660, "harr"=>8596, "hearts"=>9829,
|
||||||
|
"hellip"=>8230, "iacute"=>237, "icirc"=>238, "iexcl"=>161, "igrave"=>236,
|
||||||
|
"image"=>8465, "infin"=>8734, "int"=>8747, "iota"=>953, "iquest"=>191,
|
||||||
|
"isin"=>8712, "iuml"=>239, "kappa"=>954, "lArr"=>8656, "lambda"=>955,
|
||||||
|
"lang"=>9001, "laquo"=>171, "larr"=>8592, "lceil"=>8968, "ldquo"=>8220,
|
||||||
|
"le"=>8804, "lfloor"=>8970, "lowast"=>8727, "loz"=>9674, "lrm"=>8206,
|
||||||
|
"lsaquo"=>8249, "lsquo"=>8216, "lt"=>60, "macr"=>175, "mdash"=>8212,
|
||||||
|
"micro"=>181, "middot"=>183, "minus"=>8722, "mu"=>956, "nabla"=>8711,
|
||||||
|
"nbsp"=>160, "ndash"=>8211, "ne"=>8800, "ni"=>8715, "not"=>172, "notin"=>8713,
|
||||||
|
"nsub"=>8836, "ntilde"=>241, "nu"=>957, "oacute"=>243, "ocirc"=>244,
|
||||||
|
"oelig"=>339, "ograve"=>242, "oline"=>8254, "omega"=>969, "omicron"=>959,
|
||||||
|
"oplus"=>8853, "or"=>8744, "ordf"=>170, "ordm"=>186, "oslash"=>248,
|
||||||
|
"otilde"=>245, "otimes"=>8855, "ouml"=>246, "para"=>182, "part"=>8706,
|
||||||
|
"permil"=>8240, "perp"=>8869, "phi"=>966, "pi"=>960, "piv"=>982,
|
||||||
|
"plusmn"=>177, "pound"=>163, "prime"=>8242, "prod"=>8719, "prop"=>8733,
|
||||||
|
"psi"=>968, "quot"=>34, "rArr"=>8658, "radic"=>8730, "rang"=>9002,
|
||||||
|
"raquo"=>187, "rarr"=>8594, "rceil"=>8969, "rdquo"=>8221, "real"=>8476,
|
||||||
|
"reg"=>174, "rfloor"=>8971, "rho"=>961, "rlm"=>8207, "rsaquo"=>8250,
|
||||||
|
"rsquo"=>8217, "sbquo"=>8218, "scaron"=>353, "sdot"=>8901, "sect"=>167,
|
||||||
|
"shy"=>173, "sigma"=>963, "sigmaf"=>962, "sim"=>8764, "spades"=>9824,
|
||||||
|
"sub"=>8834, "sube"=>8838, "sum"=>8721, "sup"=>8835, "sup1"=>185, "sup2"=>178,
|
||||||
|
"sup3"=>179, "supe"=>8839, "szlig"=>223, "tau"=>964, "there4"=>8756,
|
||||||
|
"theta"=>952, "thetasym"=>977, "thinsp"=>8201, "thorn"=>254, "tilde"=>732,
|
||||||
|
"times"=>215, "trade"=>8482, "uArr"=>8657, "uacute"=>250, "uarr"=>8593,
|
||||||
|
"ucirc"=>251, "ugrave"=>249, "uml"=>168, "upsih"=>978, "upsilon"=>965,
|
||||||
|
"uuml"=>252, "weierp"=>8472, "xi"=>958, "yacute"=>253, "yen"=>165,
|
||||||
|
"yuml"=>255, "zeta"=>950, "zwj"=>8205, "zwnj"=>8204}
|
||||||
|
|
||||||
|
|
||||||
|
NamedCharactersPattern = /\A(?-mix:AElig|Aacute|Acirc|Agrave|Alpha|Aring|Atilde|Auml|Beta|Ccedil|Chi|Dagger|Delta|ETH|Eacute|Ecirc|Egrave|Epsilon|Eta|Euml|Gamma|Iacute|Icirc|Igrave|Iota|Iuml|Kappa|Lambda|Mu|Ntilde|Nu|OElig|Oacute|Ocirc|Ograve|Omega|Omicron|Oslash|Otilde|Ouml|Phi|Pi|Prime|Psi|Rho|Scaron|Sigma|THORN|Tau|Theta|Uacute|Ucirc|Ugrave|Upsilon|Uuml|Xi|Yacute|Yuml|Zeta|aacute|acirc|acute|aelig|agrave|alefsym|alpha|amp|and|ang|apos|aring|asymp|atilde|auml|bdquo|beta|brvbar|bull|cap|ccedil|cedil|cent|chi|circ|clubs|cong|copy|crarr|cup|curren|dArr|dagger|darr|deg|delta|diams|divide|eacute|ecirc|egrave|empty|emsp|ensp|epsilon|equiv|eta|eth|euml|euro|exist|fnof|forall|frac12|frac14|frac34|frasl|gamma|ge|gt|hArr|harr|hearts|hellip|iacute|icirc|iexcl|igrave|image|infin|int|iota|iquest|isin|iuml|kappa|lArr|lambda|lang|laquo|larr|lceil|ldquo|le|lfloor|lowast|loz|lrm|lsaquo|lsquo|lt|macr|mdash|micro|middot|minus|mu|nabla|nbsp|ndash|ne|ni|not|notin|nsub|ntilde|nu|oacute|ocirc|oelig|ograve|oline|omega|omicron|oplus|or|ordf|ordm|oslash|otilde|otimes|ouml|para|part|permil|perp|phi|pi|piv|plusmn|pound|prime|prod|prop|psi|quot|rArr|radic|rang|raquo|rarr|rceil|rdquo|real|reg|rfloor|rho|rlm|rsaquo|rsquo|sbquo|scaron|sdot|sect|shy|sigma|sigmaf|sim|spades|sub|sube|sum|sup|sup1|sup2|sup3|supe|szlig|tau|there4|theta|thetasym|thinsp|thorn|tilde|times|trade|uArr|uacute|uarr|ucirc|ugrave|uml|upsih|upsilon|uuml|weierp|xi|yacute|yen|yuml|zeta|zwj|zwnj)\z/
|
||||||
|
|
||||||
|
ElementContent =
|
||||||
|
{"h6"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"object"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "param", "pre", "q",
|
||||||
|
"s", "samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"dl"=>["dd", "dt"],
|
||||||
|
"p"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"acronym"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"code"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"ul"=>["li"],
|
||||||
|
"tt"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"label"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"form"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"q"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"thead"=>["tr"],
|
||||||
|
"area"=>:EMPTY,
|
||||||
|
"td"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"title"=>[],
|
||||||
|
"dir"=>["li"],
|
||||||
|
"s"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"ol"=>["li"],
|
||||||
|
"hr"=>:EMPTY,
|
||||||
|
"applet"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "param", "pre", "q",
|
||||||
|
"s", "samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"table"=>["caption", "col", "colgroup", "tbody", "tfoot", "thead", "tr"],
|
||||||
|
"legend"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"cite"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"a"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"html"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "base", "basefont", "bdo",
|
||||||
|
"big", "blockquote", "body", "br", "button", "center", "cite", "code",
|
||||||
|
"dfn", "dir", "div", "dl", "em", "fieldset", "font", "form", "h1", "h2",
|
||||||
|
"h3", "h4", "h5", "h6", "head", "hr", "i", "iframe", "img", "input",
|
||||||
|
"isindex", "kbd", "label", "map", "menu", "noframes", "noscript", "object",
|
||||||
|
"ol", "p", "pre", "q", "s", "samp", "script", "select", "small", "span",
|
||||||
|
"strike", "strong", "sub", "sup", "table", "textarea", "title", "tt", "u",
|
||||||
|
"ul", "var"],
|
||||||
|
"u"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"blockquote"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"center"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"b"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"base"=>:EMPTY,
|
||||||
|
"th"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"link"=>:EMPTY,
|
||||||
|
"var"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"samp"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"div"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"textarea"=>[],
|
||||||
|
"pre"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"head"=>["base", "isindex", "title"],
|
||||||
|
"span"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"br"=>:EMPTY,
|
||||||
|
"script"=>:CDATA,
|
||||||
|
"noframes"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"style"=>:CDATA,
|
||||||
|
"meta"=>:EMPTY,
|
||||||
|
"dt"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"option"=>[],
|
||||||
|
"kbd"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"big"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"tfoot"=>["tr"],
|
||||||
|
"sup"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"bdo"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"isindex"=>:EMPTY,
|
||||||
|
"dfn"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"fieldset"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "legend",
|
||||||
|
"map", "menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"em"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"font"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"tbody"=>["tr"],
|
||||||
|
"noscript"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"li"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"col"=>:EMPTY,
|
||||||
|
"small"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"dd"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"i"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"menu"=>["li"],
|
||||||
|
"strong"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"basefont"=>:EMPTY,
|
||||||
|
"img"=>:EMPTY,
|
||||||
|
"optgroup"=>["option"],
|
||||||
|
"map"=>
|
||||||
|
["address", "area", "blockquote", "center", "dir", "div", "dl", "fieldset",
|
||||||
|
"form", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "isindex", "menu",
|
||||||
|
"noframes", "noscript", "ol", "p", "pre", "table", "ul"],
|
||||||
|
"h1"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"address"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "p", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"sub"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"param"=>:EMPTY,
|
||||||
|
"input"=>:EMPTY,
|
||||||
|
"h2"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"abbr"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"h3"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"strike"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"body"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"ins"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"button"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"h4"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"select"=>["optgroup", "option"],
|
||||||
|
"caption"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"colgroup"=>["col"],
|
||||||
|
"tr"=>["td", "th"],
|
||||||
|
"del"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"],
|
||||||
|
"h5"=>
|
||||||
|
["a", "abbr", "acronym", "applet", "b", "basefont", "bdo", "big", "br",
|
||||||
|
"button", "cite", "code", "dfn", "em", "font", "i", "iframe", "img",
|
||||||
|
"input", "kbd", "label", "map", "object", "q", "s", "samp", "script",
|
||||||
|
"select", "small", "span", "strike", "strong", "sub", "sup", "textarea",
|
||||||
|
"tt", "u", "var"],
|
||||||
|
"iframe"=>
|
||||||
|
["a", "abbr", "acronym", "address", "applet", "b", "basefont", "bdo", "big",
|
||||||
|
"blockquote", "br", "button", "center", "cite", "code", "dfn", "dir", "div",
|
||||||
|
"dl", "em", "fieldset", "font", "form", "h1", "h2", "h3", "h4", "h5", "h6",
|
||||||
|
"hr", "i", "iframe", "img", "input", "isindex", "kbd", "label", "map",
|
||||||
|
"menu", "noframes", "noscript", "object", "ol", "p", "pre", "q", "s",
|
||||||
|
"samp", "script", "select", "small", "span", "strike", "strong", "sub",
|
||||||
|
"sup", "table", "textarea", "tt", "u", "ul", "var"]}
|
||||||
|
|
||||||
|
ElementInclusions =
|
||||||
|
{"head"=>["link", "meta", "object", "script", "style"], "body"=>["del", "ins"]}
|
||||||
|
|
||||||
|
ElementExclusions =
|
||||||
|
{"button"=>
|
||||||
|
["a", "button", "fieldset", "form", "iframe", "input", "isindex", "label",
|
||||||
|
"select", "textarea"],
|
||||||
|
"a"=>["a"],
|
||||||
|
"dir"=>
|
||||||
|
["address", "blockquote", "center", "dir", "div", "dl", "fieldset", "form",
|
||||||
|
"h1", "h2", "h3", "h4", "h5", "h6", "hr", "isindex", "menu", "noframes",
|
||||||
|
"noscript", "ol", "p", "pre", "table", "ul"],
|
||||||
|
"title"=>["link", "meta", "object", "script", "style"],
|
||||||
|
"pre"=>
|
||||||
|
["applet", "basefont", "big", "font", "img", "object", "small", "sub",
|
||||||
|
"sup"],
|
||||||
|
"form"=>["form"],
|
||||||
|
"menu"=>
|
||||||
|
["address", "blockquote", "center", "dir", "div", "dl", "fieldset", "form",
|
||||||
|
"h1", "h2", "h3", "h4", "h5", "h6", "hr", "isindex", "menu", "noframes",
|
||||||
|
"noscript", "ol", "p", "pre", "table", "ul"],
|
||||||
|
"label"=>["label"]}
|
||||||
|
|
||||||
|
OmittedAttrName =
|
||||||
|
{"h6"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"object"=>
|
||||||
|
{"bottom"=>"align", "declare"=>"declare", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"align", "right"=>"align", "rtl"=>"dir", "top"=>"align"},
|
||||||
|
"dl"=>{"compact"=>"compact", "ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"p"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"acronym"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"code"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"ul"=>
|
||||||
|
{"circle"=>"type", "compact"=>"compact", "disc"=>"type", "ltr"=>"dir",
|
||||||
|
"rtl"=>"dir", "square"=>"type"},
|
||||||
|
"tt"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"label"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"form"=>{"get"=>"method", "ltr"=>"dir", "post"=>"method", "rtl"=>"dir"},
|
||||||
|
"q"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"thead"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"valign", "right"=>"align", "rtl"=>"dir", "top"=>"valign"},
|
||||||
|
"area"=>
|
||||||
|
{"circle"=>"shape", "default"=>"shape", "ltr"=>"dir", "nohref"=>"nohref",
|
||||||
|
"poly"=>"shape", "rect"=>"shape", "rtl"=>"dir"},
|
||||||
|
"td"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "col"=>"scope", "colgroup"=>"scope", "justify"=>"align",
|
||||||
|
"left"=>"align", "ltr"=>"dir", "middle"=>"valign", "nowrap"=>"nowrap",
|
||||||
|
"right"=>"align", "row"=>"scope", "rowgroup"=>"scope", "rtl"=>"dir",
|
||||||
|
"top"=>"valign"},
|
||||||
|
"title"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"dir"=>{"compact"=>"compact", "ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"s"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"ol"=>{"compact"=>"compact", "ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"hr"=>
|
||||||
|
{"center"=>"align", "left"=>"align", "ltr"=>"dir", "noshade"=>"noshade",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"applet"=>
|
||||||
|
{"bottom"=>"align", "left"=>"align", "middle"=>"align", "right"=>"align",
|
||||||
|
"top"=>"align"},
|
||||||
|
"table"=>
|
||||||
|
{"above"=>"frame", "all"=>"rules", "below"=>"frame", "border"=>"frame",
|
||||||
|
"box"=>"frame", "center"=>"align", "cols"=>"rules", "groups"=>"rules",
|
||||||
|
"hsides"=>"frame", "left"=>"align", "lhs"=>"frame", "ltr"=>"dir",
|
||||||
|
"none"=>"rules", "rhs"=>"frame", "right"=>"align", "rows"=>"rules",
|
||||||
|
"rtl"=>"dir", "void"=>"frame", "vsides"=>"frame"},
|
||||||
|
"legend"=>
|
||||||
|
{"bottom"=>"align", "left"=>"align", "ltr"=>"dir", "right"=>"align",
|
||||||
|
"rtl"=>"dir", "top"=>"align"},
|
||||||
|
"cite"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"a"=>
|
||||||
|
{"circle"=>"shape", "default"=>"shape", "ltr"=>"dir", "poly"=>"shape",
|
||||||
|
"rect"=>"shape", "rtl"=>"dir"},
|
||||||
|
"html"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"u"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"blockquote"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"center"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"b"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"th"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "col"=>"scope", "colgroup"=>"scope", "justify"=>"align",
|
||||||
|
"left"=>"align", "ltr"=>"dir", "middle"=>"valign", "nowrap"=>"nowrap",
|
||||||
|
"right"=>"align", "row"=>"scope", "rowgroup"=>"scope", "rtl"=>"dir",
|
||||||
|
"top"=>"valign"},
|
||||||
|
"link"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"var"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"samp"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"div"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"textarea"=>
|
||||||
|
{"disabled"=>"disabled", "ltr"=>"dir", "readonly"=>"readonly", "rtl"=>"dir"},
|
||||||
|
"pre"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"head"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"span"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"br"=>{"all"=>"clear", "left"=>"clear", "none"=>"clear", "right"=>"clear"},
|
||||||
|
"script"=>{"defer"=>"defer"},
|
||||||
|
"noframes"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"style"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"meta"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"dt"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"option"=>
|
||||||
|
{"disabled"=>"disabled", "ltr"=>"dir", "rtl"=>"dir", "selected"=>"selected"},
|
||||||
|
"kbd"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"big"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"tfoot"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"valign", "right"=>"align", "rtl"=>"dir", "top"=>"valign"},
|
||||||
|
"sup"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"bdo"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"isindex"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"dfn"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"fieldset"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"em"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"font"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"tbody"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"valign", "right"=>"align", "rtl"=>"dir", "top"=>"valign"},
|
||||||
|
"noscript"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"li"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"col"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"valign", "right"=>"align", "rtl"=>"dir", "top"=>"valign"},
|
||||||
|
"small"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"dd"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"i"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"menu"=>{"compact"=>"compact", "ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"strong"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"img"=>
|
||||||
|
{"bottom"=>"align", "ismap"=>"ismap", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"align", "right"=>"align", "rtl"=>"dir", "top"=>"align"},
|
||||||
|
"optgroup"=>{"disabled"=>"disabled", "ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"map"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"address"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"h1"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"sub"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"param"=>{"data"=>"valuetype", "object"=>"valuetype", "ref"=>"valuetype"},
|
||||||
|
"input"=>
|
||||||
|
{"bottom"=>"align", "button"=>"type", "checkbox"=>"type",
|
||||||
|
"checked"=>"checked", "disabled"=>"disabled", "file"=>"type",
|
||||||
|
"hidden"=>"type", "image"=>"type", "ismap"=>"ismap", "left"=>"align",
|
||||||
|
"ltr"=>"dir", "middle"=>"align", "password"=>"type", "radio"=>"type",
|
||||||
|
"readonly"=>"readonly", "reset"=>"type", "right"=>"align", "rtl"=>"dir",
|
||||||
|
"submit"=>"type", "text"=>"type", "top"=>"align"},
|
||||||
|
"h2"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"abbr"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"h3"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"strike"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"body"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"ins"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"button"=>
|
||||||
|
{"button"=>"type", "disabled"=>"disabled", "ltr"=>"dir", "reset"=>"type",
|
||||||
|
"rtl"=>"dir", "submit"=>"type"},
|
||||||
|
"h4"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"select"=>
|
||||||
|
{"disabled"=>"disabled", "ltr"=>"dir", "multiple"=>"multiple", "rtl"=>"dir"},
|
||||||
|
"caption"=>
|
||||||
|
{"bottom"=>"align", "left"=>"align", "ltr"=>"dir", "right"=>"align",
|
||||||
|
"rtl"=>"dir", "top"=>"align"},
|
||||||
|
"colgroup"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"valign", "right"=>"align", "rtl"=>"dir", "top"=>"valign"},
|
||||||
|
"tr"=>
|
||||||
|
{"baseline"=>"valign", "bottom"=>"valign", "center"=>"align",
|
||||||
|
"char"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"middle"=>"valign", "right"=>"align", "rtl"=>"dir", "top"=>"valign"},
|
||||||
|
"del"=>{"ltr"=>"dir", "rtl"=>"dir"},
|
||||||
|
"h5"=>
|
||||||
|
{"center"=>"align", "justify"=>"align", "left"=>"align", "ltr"=>"dir",
|
||||||
|
"right"=>"align", "rtl"=>"dir"},
|
||||||
|
"iframe"=>
|
||||||
|
{"0"=>"frameborder", "1"=>"frameborder", "auto"=>"scrolling",
|
||||||
|
"bottom"=>"align", "left"=>"align", "middle"=>"align", "no"=>"scrolling",
|
||||||
|
"right"=>"align", "top"=>"align", "yes"=>"scrolling"}}
|
||||||
|
|
||||||
|
# :startdoc:
|
||||||
|
# The code above is auto-generated. Don't edit manually.
|
||||||
|
end
|
@ -0,0 +1,107 @@
|
|||||||
|
require 'pp'
|
||||||
|
|
||||||
|
module Hpricot
|
||||||
|
# :stopdoc:
|
||||||
|
class Elements
|
||||||
|
def pretty_print(q)
|
||||||
|
q.object_group(self) { super }
|
||||||
|
end
|
||||||
|
alias inspect pretty_print_inspect
|
||||||
|
end
|
||||||
|
|
||||||
|
class Doc
|
||||||
|
def pretty_print(q)
|
||||||
|
q.object_group(self) { @children.each {|elt| q.breakable; q.pp elt } }
|
||||||
|
end
|
||||||
|
alias inspect pretty_print_inspect
|
||||||
|
end
|
||||||
|
|
||||||
|
class Elem
|
||||||
|
def pretty_print(q)
|
||||||
|
if empty?
|
||||||
|
q.group(1, '{emptyelem', '}') {
|
||||||
|
q.breakable; q.pp @stag
|
||||||
|
}
|
||||||
|
else
|
||||||
|
q.group(1, "{elem", "}") {
|
||||||
|
q.breakable; q.pp @stag
|
||||||
|
if @children
|
||||||
|
@children.each {|elt| q.breakable; q.pp elt }
|
||||||
|
end
|
||||||
|
if @etag
|
||||||
|
q.breakable; q.pp @etag
|
||||||
|
end
|
||||||
|
}
|
||||||
|
end
|
||||||
|
end
|
||||||
|
alias inspect pretty_print_inspect
|
||||||
|
end
|
||||||
|
|
||||||
|
module Leaf
|
||||||
|
def pretty_print(q)
|
||||||
|
q.group(1, '{', '}') {
|
||||||
|
q.text self.class.name.sub(/.*::/,'').downcase
|
||||||
|
if rs = @raw_string
|
||||||
|
rs.scan(/[^\r\n]*(?:\r\n?|\n|[^\r\n]\z)/) {|line|
|
||||||
|
q.breakable
|
||||||
|
q.pp line
|
||||||
|
}
|
||||||
|
elsif self.respond_to? :to_s
|
||||||
|
q.breakable
|
||||||
|
q.text self.to_s
|
||||||
|
end
|
||||||
|
}
|
||||||
|
end
|
||||||
|
alias inspect pretty_print_inspect
|
||||||
|
end
|
||||||
|
|
||||||
|
class STag
|
||||||
|
def pretty_print(q)
|
||||||
|
q.group(1, '<', '>') {
|
||||||
|
q.text @name
|
||||||
|
|
||||||
|
if @raw_attributes
|
||||||
|
@raw_attributes.each {|n, t|
|
||||||
|
q.breakable
|
||||||
|
if t
|
||||||
|
q.text "#{n}=\"#{Hpricot.uxs(t)}\""
|
||||||
|
else
|
||||||
|
q.text n
|
||||||
|
end
|
||||||
|
}
|
||||||
|
end
|
||||||
|
}
|
||||||
|
end
|
||||||
|
alias inspect pretty_print_inspect
|
||||||
|
end
|
||||||
|
|
||||||
|
class ETag
|
||||||
|
def pretty_print(q)
|
||||||
|
q.group(1, '</', '>') {
|
||||||
|
q.text @name
|
||||||
|
}
|
||||||
|
end
|
||||||
|
alias inspect pretty_print_inspect
|
||||||
|
end
|
||||||
|
|
||||||
|
class Text
|
||||||
|
def pretty_print(q)
|
||||||
|
q.text @content.dump
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class BogusETag
|
||||||
|
def pretty_print(q)
|
||||||
|
q.group(1, '{', '}') {
|
||||||
|
q.text self.class.name.sub(/.*::/,'').downcase
|
||||||
|
if rs = @raw_string
|
||||||
|
q.breakable
|
||||||
|
q.text rs
|
||||||
|
else
|
||||||
|
q.text "</#{@name}>"
|
||||||
|
end
|
||||||
|
}
|
||||||
|
end
|
||||||
|
end
|
||||||
|
# :startdoc:
|
||||||
|
end
|
@ -0,0 +1,37 @@
|
|||||||
|
module Hpricot
|
||||||
|
class Name; include Hpricot end
|
||||||
|
class Context; include Hpricot end
|
||||||
|
|
||||||
|
# :stopdoc:
|
||||||
|
module Tag; include Hpricot end
|
||||||
|
class STag; include Tag end
|
||||||
|
class ETag; include Tag end
|
||||||
|
# :startdoc:
|
||||||
|
|
||||||
|
module Node; include Hpricot end
|
||||||
|
module Container; include Node end
|
||||||
|
class Doc; include Container end
|
||||||
|
class Elem; include Container end
|
||||||
|
module Leaf; include Node end
|
||||||
|
class Text; include Leaf end
|
||||||
|
class XMLDecl; include Leaf end
|
||||||
|
class DocType; include Leaf end
|
||||||
|
class ProcIns; include Leaf end
|
||||||
|
class Comment; include Leaf end
|
||||||
|
class BogusETag; include Leaf end
|
||||||
|
|
||||||
|
module Traverse end
|
||||||
|
module Container::Trav; include Traverse end
|
||||||
|
module Leaf::Trav; include Traverse end
|
||||||
|
class Doc; module Trav; include Container::Trav end; include Trav end
|
||||||
|
class Elem; module Trav; include Container::Trav end; include Trav end
|
||||||
|
class Text; module Trav; include Leaf::Trav end; include Trav end
|
||||||
|
class XMLDecl; module Trav; include Leaf::Trav end; include Trav end
|
||||||
|
class DocType; module Trav; include Leaf::Trav end; include Trav end
|
||||||
|
class ProcIns; module Trav; include Leaf::Trav end; include Trav end
|
||||||
|
class Comment; module Trav; include Leaf::Trav end; include Trav end
|
||||||
|
class BogusETag; module Trav; include Leaf::Trav end; include Trav end
|
||||||
|
|
||||||
|
class Error < StandardError; end
|
||||||
|
end
|
||||||
|
|
@ -0,0 +1,297 @@
|
|||||||
|
require 'hpricot/htmlinfo'
|
||||||
|
|
||||||
|
def Hpricot(input = nil, opts = {}, &blk)
|
||||||
|
Hpricot.parse(input, opts, &blk)
|
||||||
|
end
|
||||||
|
|
||||||
|
module Hpricot
|
||||||
|
# Exception class used for any errors related to deficiencies in the system when
|
||||||
|
# handling the character encodings of a document.
|
||||||
|
class EncodingError < StandardError; end
|
||||||
|
|
||||||
|
# Hpricot.parse parses <i>input</i> and return a document tree.
|
||||||
|
# represented by Hpricot::Doc.
|
||||||
|
def Hpricot.parse(input = nil, opts = {}, &blk)
|
||||||
|
Doc.new(make(input, opts, &blk))
|
||||||
|
end
|
||||||
|
|
||||||
|
# Hpricot::XML parses <i>input</i>, disregarding all the HTML rules
|
||||||
|
# and returning a document tree.
|
||||||
|
def Hpricot.XML(input, opts = {})
|
||||||
|
Doc.new(make(input, opts.merge(:xml => true)))
|
||||||
|
end
|
||||||
|
|
||||||
|
# :stopdoc:
|
||||||
|
|
||||||
|
def Hpricot.make(input = nil, opts = {}, &blk)
|
||||||
|
opts = {:fixup_tags => false}.merge(opts)
|
||||||
|
unless input or blk
|
||||||
|
raise ArgumentError, "An Hpricot document must be built from an input source (a String) or a block."
|
||||||
|
end
|
||||||
|
|
||||||
|
conv = opts[:xml] ? :to_s : :downcase
|
||||||
|
|
||||||
|
fragment =
|
||||||
|
if input
|
||||||
|
case opts[:encoding]
|
||||||
|
when nil
|
||||||
|
when 'utf-8'
|
||||||
|
unless defined? Encoding::Character::UTF8
|
||||||
|
raise EncodingError, "The ruby-character-encodings library could not be found for utf-8 mode."
|
||||||
|
end
|
||||||
|
else
|
||||||
|
raise EncodingError, "No encoding option `#{opts[:encoding]}' is available."
|
||||||
|
end
|
||||||
|
|
||||||
|
if opts[:xhtml_strict]
|
||||||
|
opts[:fixup_tags] = true
|
||||||
|
end
|
||||||
|
|
||||||
|
stack = [[nil, nil, [], [], [], []]]
|
||||||
|
Hpricot.scan(input) do |token|
|
||||||
|
if stack.last[5] == :CDATA and ![:procins, :comment, :cdata].include?(token[0]) and
|
||||||
|
!(token[0] == :etag and token[1].casecmp(stack.last[0]).zero?)
|
||||||
|
token[0] = :text
|
||||||
|
token[1] = token[3] if token[3]
|
||||||
|
end
|
||||||
|
|
||||||
|
if !opts[:xml] and token[0] == :emptytag
|
||||||
|
token[1] = token[1].send(conv)
|
||||||
|
if ElementContent[token[1].downcase] != :EMPTY
|
||||||
|
token[0] = :stag
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# TODO: downcase instead when parsing attributes?
|
||||||
|
if !opts[:xml] and token[2].is_a?(Hash)
|
||||||
|
token[2] = token[2].inject({}) { |hsh,(k,v)| hsh[k.downcase] = v; hsh }
|
||||||
|
end
|
||||||
|
|
||||||
|
case token[0]
|
||||||
|
when :stag
|
||||||
|
case opts[:encoding] when 'utf-8'
|
||||||
|
token.map! { |str| u(str) if str.is_a? String }
|
||||||
|
end
|
||||||
|
|
||||||
|
stagname = token[0] = token[1] = token[1].send(conv)
|
||||||
|
if ElementContent[stagname] == :EMPTY and !opts[:xml]
|
||||||
|
token[0] = :emptytag
|
||||||
|
stack.last[2] << token
|
||||||
|
else
|
||||||
|
unless opts[:xml]
|
||||||
|
if opts[:fixup_tags]
|
||||||
|
# obey the tag rules set up by the current element
|
||||||
|
if ElementContent.has_key? stagname
|
||||||
|
trans = nil
|
||||||
|
(stack.length-1).downto(0) do |i|
|
||||||
|
untags = stack[i][5]
|
||||||
|
break unless untags.include? stagname
|
||||||
|
# puts "** ILLEGAL #{stagname} IN #{stack[i][0]}"
|
||||||
|
trans = i
|
||||||
|
end
|
||||||
|
if trans.to_i > 1
|
||||||
|
eles = stack.slice!(trans..-1)
|
||||||
|
stack.last[2] += eles
|
||||||
|
# puts "** TRANSPLANTED #{stagname} TO #{stack.last[0]}"
|
||||||
|
end
|
||||||
|
elsif opts[:xhtml_strict]
|
||||||
|
token[2] = {'class' => stagname}
|
||||||
|
stagname = token[0] = "div"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# setup tag rules for inside this element
|
||||||
|
if ElementContent[stagname] == :CDATA
|
||||||
|
uncontainable_tags = :CDATA
|
||||||
|
elsif opts[:fixup_tags]
|
||||||
|
possible_tags = ElementContent[stagname]
|
||||||
|
excluded_tags, included_tags = stack.last[3..4]
|
||||||
|
if possible_tags
|
||||||
|
excluded_tags = excluded_tags | (ElementExclusions[stagname] || [])
|
||||||
|
included_tags = included_tags | (ElementInclusions[stagname] || [])
|
||||||
|
containable_tags = (possible_tags | included_tags) - excluded_tags
|
||||||
|
uncontainable_tags = ElementContent.keys - containable_tags
|
||||||
|
else
|
||||||
|
# If the tagname is unknown, it is assumed that any element
|
||||||
|
# except excluded can be contained.
|
||||||
|
uncontainable_tags = excluded_tags
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
unless opts[:xml]
|
||||||
|
case token[2] when Hash
|
||||||
|
token[2] = token[2].inject({}) { |hsh,(k,v)| hsh[k.downcase] = v; hsh }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
stack << [stagname, token, [], excluded_tags, included_tags, uncontainable_tags]
|
||||||
|
end
|
||||||
|
when :etag
|
||||||
|
etagname = token[0] = token[1].send(conv)
|
||||||
|
if opts[:xhtml_strict] and not ElementContent.has_key? etagname
|
||||||
|
etagname = token[0] = "div"
|
||||||
|
end
|
||||||
|
matched_elem = nil
|
||||||
|
(stack.length-1).downto(0) do |i|
|
||||||
|
stagname, = stack[i]
|
||||||
|
if stagname == etagname
|
||||||
|
matched_elem = stack[i]
|
||||||
|
stack[i][1] += token
|
||||||
|
eles = stack.slice!((i+1)..-1)
|
||||||
|
stack.last[2] += eles
|
||||||
|
break
|
||||||
|
end
|
||||||
|
end
|
||||||
|
unless matched_elem
|
||||||
|
stack.last[2] << [:bogus_etag, token.first, token.last]
|
||||||
|
else
|
||||||
|
ele = stack.pop
|
||||||
|
stack.last[2] << ele
|
||||||
|
end
|
||||||
|
when :text
|
||||||
|
l = stack.last[2].last
|
||||||
|
if l and l[0] == :text
|
||||||
|
l[1] += token[1]
|
||||||
|
else
|
||||||
|
stack.last[2] << token
|
||||||
|
end
|
||||||
|
else
|
||||||
|
stack.last[2] << token
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
while 1 < stack.length
|
||||||
|
ele = stack.pop
|
||||||
|
stack.last[2] << ele
|
||||||
|
end
|
||||||
|
|
||||||
|
structure_list = stack[0][2]
|
||||||
|
structure_list.map {|s| build_node(s, opts) }
|
||||||
|
elsif blk
|
||||||
|
Hpricot.build(&blk).children
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def Hpricot.build_node(structure, opts = {})
|
||||||
|
case structure[0]
|
||||||
|
when String
|
||||||
|
tagname, _, attrs, sraw, _, _, _, eraw = structure[1]
|
||||||
|
children = structure[2]
|
||||||
|
etag = eraw && ETag.parse(tagname, eraw)
|
||||||
|
stag = STag.parse(tagname, attrs, sraw, true)
|
||||||
|
if !children.empty? || etag
|
||||||
|
Elem.new(stag,
|
||||||
|
children.map {|c| build_node(c, opts) },
|
||||||
|
etag)
|
||||||
|
else
|
||||||
|
Elem.new(stag)
|
||||||
|
end
|
||||||
|
when :text
|
||||||
|
Text.parse_pcdata(structure[1])
|
||||||
|
when :emptytag
|
||||||
|
Elem.new(STag.parse(structure[1], structure[2], structure[3], false))
|
||||||
|
when :bogus_etag
|
||||||
|
BogusETag.parse(structure[1], structure[2])
|
||||||
|
when :xmldecl
|
||||||
|
XMLDecl.parse(structure[2], structure[3])
|
||||||
|
when :doctype
|
||||||
|
if opts[:xhtml_strict]
|
||||||
|
structure[2]['system_id'] = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
|
||||||
|
structure[2]['public_id'] = "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||||
|
end
|
||||||
|
DocType.parse(structure[1], structure[2], structure[3])
|
||||||
|
when :procins
|
||||||
|
ProcIns.parse(structure[1])
|
||||||
|
when :comment
|
||||||
|
Comment.parse(structure[1])
|
||||||
|
when :cdata_content
|
||||||
|
Text.parse_cdata_content(structure[1])
|
||||||
|
when :cdata
|
||||||
|
Text.parse_cdata_section(structure[1])
|
||||||
|
else
|
||||||
|
raise Exception, "[bug] unknown structure: #{structure.inspect}"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def STag.parse(qname, attrs, raw_string, is_stag)
|
||||||
|
result = STag.new(qname, attrs)
|
||||||
|
result.raw_string = raw_string
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def ETag.parse(qname, raw_string)
|
||||||
|
result = self.new(qname)
|
||||||
|
result.raw_string = raw_string
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def BogusETag.parse(qname, raw_string)
|
||||||
|
result = self.new(qname)
|
||||||
|
result.raw_string = raw_string
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def Text.parse_pcdata(raw_string)
|
||||||
|
result = Text.new(raw_string)
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def Text.parse_cdata_content(raw_string)
|
||||||
|
result = CData.new(raw_string)
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def Text.parse_cdata_section(content)
|
||||||
|
result = CData.new(content)
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def XMLDecl.parse(attrs, raw_string)
|
||||||
|
attrs ||= {}
|
||||||
|
version = attrs['version']
|
||||||
|
encoding = attrs['encoding']
|
||||||
|
case attrs['standalone']
|
||||||
|
when 'yes'
|
||||||
|
standalone = true
|
||||||
|
when 'no'
|
||||||
|
standalone = false
|
||||||
|
else
|
||||||
|
standalone = nil
|
||||||
|
end
|
||||||
|
|
||||||
|
result = XMLDecl.new(version, encoding, standalone)
|
||||||
|
result.raw_string = raw_string
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def DocType.parse(root_element_name, attrs, raw_string)
|
||||||
|
if attrs
|
||||||
|
public_identifier = attrs['public_id']
|
||||||
|
system_identifier = attrs['system_id']
|
||||||
|
end
|
||||||
|
|
||||||
|
root_element_name = root_element_name.downcase
|
||||||
|
|
||||||
|
result = DocType.new(root_element_name, public_identifier, system_identifier)
|
||||||
|
result.raw_string = raw_string
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def ProcIns.parse(raw_string)
|
||||||
|
_, target, content = *raw_string.match(/\A<\?(\S+)\s+(.+)/m)
|
||||||
|
result = ProcIns.new(target, content)
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
def Comment.parse(content)
|
||||||
|
result = Comment.new(content)
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
module Pat
|
||||||
|
NameChar = /[-A-Za-z0-9._:]/
|
||||||
|
Name = /[A-Za-z_:]#{NameChar}*/
|
||||||
|
Nmtoken = /#{NameChar}+/
|
||||||
|
end
|
||||||
|
|
||||||
|
# :startdoc:
|
||||||
|
end
|
@ -0,0 +1,228 @@
|
|||||||
|
module Hpricot
|
||||||
|
# :stopdoc:
|
||||||
|
|
||||||
|
class Doc
|
||||||
|
attr_accessor :children
|
||||||
|
def initialize(children = [])
|
||||||
|
@children = children ? children.each { |c| c.parent = self } : []
|
||||||
|
end
|
||||||
|
def output(out, opts = {})
|
||||||
|
@children.each do |n|
|
||||||
|
n.output(out, opts)
|
||||||
|
end
|
||||||
|
out
|
||||||
|
end
|
||||||
|
def altered!; end
|
||||||
|
end
|
||||||
|
|
||||||
|
class BaseEle
|
||||||
|
attr_accessor :raw_string, :parent
|
||||||
|
def html_quote(str)
|
||||||
|
"\"" + str.gsub('"', '\\"') + "\""
|
||||||
|
end
|
||||||
|
def if_output(opts)
|
||||||
|
if opts[:preserve] and not @raw_string.nil?
|
||||||
|
@raw_string
|
||||||
|
else
|
||||||
|
yield opts
|
||||||
|
end
|
||||||
|
end
|
||||||
|
def pathname; self.name end
|
||||||
|
def altered!
|
||||||
|
@raw_string = nil
|
||||||
|
end
|
||||||
|
def self.alterable(*fields)
|
||||||
|
attr_accessor(*fields)
|
||||||
|
fields.each do |f|
|
||||||
|
define_method("#{f}=") do |v|
|
||||||
|
altered!
|
||||||
|
instance_variable_set("@#{f}", v)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class Elem
|
||||||
|
attr_accessor :stag, :etag, :children
|
||||||
|
def initialize(stag, children=nil, etag=nil)
|
||||||
|
@stag, @etag = stag, etag
|
||||||
|
@children = children ? children.each { |c| c.parent = self } : []
|
||||||
|
end
|
||||||
|
def empty?; @children.empty? end
|
||||||
|
[:name, :raw_attributes, :parent, :altered!].each do |m|
|
||||||
|
[m, "#{m}="].each { |m2| define_method(m2) { |*a| [@etag, @stag].inject { |_,t| t.send(m2, *a) if t and t.respond_to?(m2) } } }
|
||||||
|
end
|
||||||
|
def attributes
|
||||||
|
if raw_attributes
|
||||||
|
raw_attributes.inject({}) do |hsh, (k, v)|
|
||||||
|
hsh[k] = Hpricot.uxs(v)
|
||||||
|
hsh
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
def to_plain_text
|
||||||
|
if self.name == 'br'
|
||||||
|
"\n"
|
||||||
|
elsif self.name == 'p'
|
||||||
|
"\n\n" + super + "\n\n"
|
||||||
|
elsif self.name == 'a' and self.has_attribute?('href')
|
||||||
|
"#{super} [#{self['href']}]"
|
||||||
|
elsif self.name == 'img' and self.has_attribute?('src')
|
||||||
|
"[img:#{self['src']}]"
|
||||||
|
else
|
||||||
|
super
|
||||||
|
end
|
||||||
|
end
|
||||||
|
def pathname; self.name end
|
||||||
|
def output(out, opts = {})
|
||||||
|
if empty? and ElementContent[@stag.name] == :EMPTY
|
||||||
|
@stag.output(out, opts.merge(:style => :empty))
|
||||||
|
else
|
||||||
|
@stag.output(out, opts)
|
||||||
|
@children.each { |n| n.output(out, opts) }
|
||||||
|
if @etag
|
||||||
|
@etag.output(out, opts)
|
||||||
|
elsif !opts[:preserve]
|
||||||
|
ETag.new(@stag.name).output(out, opts)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
out
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class STag < BaseEle
|
||||||
|
def initialize(name, attributes=nil)
|
||||||
|
@name = name.to_s
|
||||||
|
@raw_attributes = attributes || {}
|
||||||
|
end
|
||||||
|
alterable :name, :raw_attributes
|
||||||
|
def attributes_as_html
|
||||||
|
if @raw_attributes
|
||||||
|
@raw_attributes.map do |aname, aval|
|
||||||
|
" #{aname}" +
|
||||||
|
(aval ? "=\"#{aval}\"" : "")
|
||||||
|
end.join
|
||||||
|
end
|
||||||
|
end
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
"<#{@name}#{attributes_as_html}" +
|
||||||
|
(opts[:style] == :empty ? " /" : "") +
|
||||||
|
">"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class ETag < BaseEle
|
||||||
|
def initialize(qualified_name)
|
||||||
|
@name = qualified_name.to_s
|
||||||
|
end
|
||||||
|
alterable :name
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
"</#{@name}>"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class BogusETag < ETag
|
||||||
|
def output(out, opts = {}); out << if_output(opts) { '' }; end
|
||||||
|
end
|
||||||
|
|
||||||
|
class Text < BaseEle
|
||||||
|
def initialize(text)
|
||||||
|
@content = text
|
||||||
|
end
|
||||||
|
alterable :content
|
||||||
|
def pathname; "text()" end
|
||||||
|
def to_s
|
||||||
|
Hpricot.uxs(@content)
|
||||||
|
end
|
||||||
|
alias_method :inner_text, :to_s
|
||||||
|
alias_method :to_plain_text, :to_s
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
@content
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class CData < Text
|
||||||
|
alias_method :to_s, :content
|
||||||
|
alias_method :to_plain_text, :content
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
"<![CDATA[#@content]]>"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class XMLDecl < BaseEle
|
||||||
|
def initialize(version, encoding, standalone)
|
||||||
|
@version, @encoding, @standalone = version, encoding, standalone
|
||||||
|
end
|
||||||
|
alterable :version, :encoding, :standalone
|
||||||
|
def pathname; "xmldecl()" end
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
"<?xml version=\"#{@version}\"" +
|
||||||
|
(@encoding ? " encoding=\"#{encoding}\"" : "") +
|
||||||
|
(@standalone != nil ? " standalone=\"#{standalone ? 'yes' : 'no'}\"" : "") +
|
||||||
|
"?>"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class DocType < BaseEle
|
||||||
|
def initialize(target, pubid, sysid)
|
||||||
|
@target, @public_id, @system_id = target, pubid, sysid
|
||||||
|
end
|
||||||
|
alterable :target, :public_id, :system_id
|
||||||
|
def pathname; "doctype()" end
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
"<!DOCTYPE #{@target} " +
|
||||||
|
(@public_id ? "PUBLIC \"#{@public_id}\"" : "SYSTEM") +
|
||||||
|
(@system_id ? " #{html_quote(@system_id)}" : "") + ">"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class ProcIns < BaseEle
|
||||||
|
def initialize(target, content)
|
||||||
|
@target, @content = target, content
|
||||||
|
end
|
||||||
|
def pathname; "procins()" end
|
||||||
|
alterable :target, :content
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
"<?#{@target}" +
|
||||||
|
(@content ? " #{@content}" : "") +
|
||||||
|
"?>"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
class Comment < BaseEle
|
||||||
|
def initialize(content)
|
||||||
|
@content = content
|
||||||
|
end
|
||||||
|
def pathname; "comment()" end
|
||||||
|
alterable :content
|
||||||
|
def output(out, opts = {})
|
||||||
|
out <<
|
||||||
|
if_output(opts) do
|
||||||
|
"<!--#{@content}-->"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# :startdoc:
|
||||||
|
end
|
@ -0,0 +1,164 @@
|
|||||||
|
module Hpricot
|
||||||
|
|
||||||
|
FORM_TAGS = [ :form, :input, :select, :textarea ]
|
||||||
|
SELF_CLOSING_TAGS = [ :base, :meta, :link, :hr, :br, :param, :img, :area, :input, :col ]
|
||||||
|
|
||||||
|
# Common sets of attributes.
|
||||||
|
AttrCore = [:id, :class, :style, :title]
|
||||||
|
AttrI18n = [:lang, 'xml:lang'.intern, :dir]
|
||||||
|
AttrEvents = [:onclick, :ondblclick, :onmousedown, :onmouseup, :onmouseover, :onmousemove,
|
||||||
|
:onmouseout, :onkeypress, :onkeydown, :onkeyup]
|
||||||
|
AttrFocus = [:accesskey, :tabindex, :onfocus, :onblur]
|
||||||
|
AttrHAlign = [:align, :char, :charoff]
|
||||||
|
AttrVAlign = [:valign]
|
||||||
|
Attrs = AttrCore + AttrI18n + AttrEvents
|
||||||
|
|
||||||
|
# All the tags and attributes from XHTML 1.0 Strict
|
||||||
|
class XHTMLStrict
|
||||||
|
class << self
|
||||||
|
attr_accessor :tags, :tagset, :forms, :self_closing, :doctype
|
||||||
|
end
|
||||||
|
@doctype = ["-//W3C//DTD XHTML 1.0 Strict//EN", "DTD/xhtml1-strict.dtd"]
|
||||||
|
@tagset = {
|
||||||
|
:html => AttrI18n + [:id, :xmlns],
|
||||||
|
:head => AttrI18n + [:id, :profile],
|
||||||
|
:title => AttrI18n + [:id],
|
||||||
|
:base => [:href, :id],
|
||||||
|
:meta => AttrI18n + [:id, :http, :name, :content, :scheme, 'http-equiv'.intern],
|
||||||
|
:link => Attrs + [:charset, :href, :hreflang, :type, :rel, :rev, :media],
|
||||||
|
:style => AttrI18n + [:id, :type, :media, :title, 'xml:space'.intern],
|
||||||
|
:script => [:id, :charset, :type, :src, :defer, 'xml:space'.intern],
|
||||||
|
:noscript => Attrs,
|
||||||
|
:body => Attrs + [:onload, :onunload],
|
||||||
|
:div => Attrs,
|
||||||
|
:p => Attrs,
|
||||||
|
:ul => Attrs,
|
||||||
|
:ol => Attrs,
|
||||||
|
:li => Attrs,
|
||||||
|
:dl => Attrs,
|
||||||
|
:dt => Attrs,
|
||||||
|
:dd => Attrs,
|
||||||
|
:address => Attrs,
|
||||||
|
:hr => Attrs,
|
||||||
|
:pre => Attrs + ['xml:space'.intern],
|
||||||
|
:blockquote => Attrs + [:cite],
|
||||||
|
:ins => Attrs + [:cite, :datetime],
|
||||||
|
:del => Attrs + [:cite, :datetime],
|
||||||
|
:a => Attrs + AttrFocus + [:charset, :type, :name, :href, :hreflang, :rel, :rev, :shape, :coords],
|
||||||
|
:span => Attrs,
|
||||||
|
:bdo => AttrCore + AttrEvents + [:lang, 'xml:lang'.intern, :dir],
|
||||||
|
:br => AttrCore,
|
||||||
|
:em => Attrs,
|
||||||
|
:strong => Attrs,
|
||||||
|
:dfn => Attrs,
|
||||||
|
:code => Attrs,
|
||||||
|
:samp => Attrs,
|
||||||
|
:kbd => Attrs,
|
||||||
|
:var => Attrs,
|
||||||
|
:cite => Attrs,
|
||||||
|
:abbr => Attrs,
|
||||||
|
:acronym => Attrs,
|
||||||
|
:q => Attrs + [:cite],
|
||||||
|
:sub => Attrs,
|
||||||
|
:sup => Attrs,
|
||||||
|
:tt => Attrs,
|
||||||
|
:i => Attrs,
|
||||||
|
:b => Attrs,
|
||||||
|
:big => Attrs,
|
||||||
|
:small => Attrs,
|
||||||
|
:object => Attrs + [:declare, :classid, :codebase, :data, :type, :codetype, :archive, :standby, :height, :width, :usemap, :name, :tabindex],
|
||||||
|
:param => [:id, :name, :value, :valuetype, :type],
|
||||||
|
:img => Attrs + [:src, :alt, :longdesc, :height, :width, :usemap, :ismap],
|
||||||
|
:map => AttrI18n + AttrEvents + [:id, :class, :style, :title, :name],
|
||||||
|
:area => Attrs + AttrFocus + [:shape, :coords, :href, :nohref, :alt],
|
||||||
|
:form => Attrs + [:action, :method, :enctype, :onsubmit, :onreset, :accept, :accept],
|
||||||
|
:label => Attrs + [:for, :accesskey, :onfocus, :onblur],
|
||||||
|
:input => Attrs + AttrFocus + [:type, :name, :value, :checked, :disabled, :readonly, :size, :maxlength, :src, :alt, :usemap, :onselect, :onchange, :accept],
|
||||||
|
:select => Attrs + [:name, :size, :multiple, :disabled, :tabindex, :onfocus, :onblur, :onchange],
|
||||||
|
:optgroup => Attrs + [:disabled, :label],
|
||||||
|
:option => Attrs + [:selected, :disabled, :label, :value],
|
||||||
|
:textarea => Attrs + AttrFocus + [:name, :rows, :cols, :disabled, :readonly, :onselect, :onchange],
|
||||||
|
:fieldset => Attrs,
|
||||||
|
:legend => Attrs + [:accesskey],
|
||||||
|
:button => Attrs + AttrFocus + [:name, :value, :type, :disabled],
|
||||||
|
:table => Attrs + [:summary, :width, :border, :frame, :rules, :cellspacing, :cellpadding],
|
||||||
|
:caption => Attrs,
|
||||||
|
:colgroup => Attrs + AttrHAlign + AttrVAlign + [:span, :width],
|
||||||
|
:col => Attrs + AttrHAlign + AttrVAlign + [:span, :width],
|
||||||
|
:thead => Attrs + AttrHAlign + AttrVAlign,
|
||||||
|
:tfoot => Attrs + AttrHAlign + AttrVAlign,
|
||||||
|
:tbody => Attrs + AttrHAlign + AttrVAlign,
|
||||||
|
:tr => Attrs + AttrHAlign + AttrVAlign,
|
||||||
|
:th => Attrs + AttrHAlign + AttrVAlign + [:abbr, :axis, :headers, :scope, :rowspan, :colspan],
|
||||||
|
:td => Attrs + AttrHAlign + AttrVAlign + [:abbr, :axis, :headers, :scope, :rowspan, :colspan],
|
||||||
|
:h1 => Attrs,
|
||||||
|
:h2 => Attrs,
|
||||||
|
:h3 => Attrs,
|
||||||
|
:h4 => Attrs,
|
||||||
|
:h5 => Attrs,
|
||||||
|
:h6 => Attrs
|
||||||
|
}
|
||||||
|
|
||||||
|
@tags = @tagset.keys
|
||||||
|
@forms = @tags & FORM_TAGS
|
||||||
|
@self_closing = @tags & SELF_CLOSING_TAGS
|
||||||
|
end
|
||||||
|
|
||||||
|
# Additional tags found in XHTML 1.0 Transitional
|
||||||
|
class XHTMLTransitional
|
||||||
|
class << self
|
||||||
|
attr_accessor :tags, :tagset, :forms, :self_closing, :doctype
|
||||||
|
end
|
||||||
|
@doctype = ["-//W3C//DTD XHTML 1.0 Transitional//EN", "DTD/xhtml1-transitional.dtd"]
|
||||||
|
@tagset = XHTMLStrict.tagset.merge \
|
||||||
|
:strike => Attrs,
|
||||||
|
:center => Attrs,
|
||||||
|
:dir => Attrs + [:compact],
|
||||||
|
:noframes => Attrs,
|
||||||
|
:basefont => [:id, :size, :color, :face],
|
||||||
|
:u => Attrs,
|
||||||
|
:menu => Attrs + [:compact],
|
||||||
|
:iframe => AttrCore + [:longdesc, :name, :src, :frameborder, :marginwidth, :marginheight, :scrolling, :align, :height, :width],
|
||||||
|
:font => AttrCore + AttrI18n + [:size, :color, :face],
|
||||||
|
:s => Attrs,
|
||||||
|
:applet => AttrCore + [:codebase, :archive, :code, :object, :alt, :name, :width, :height, :align, :hspace, :vspace],
|
||||||
|
:isindex => AttrCore + AttrI18n + [:prompt]
|
||||||
|
|
||||||
|
# Additional attributes found in XHTML 1.0 Transitional
|
||||||
|
{ :script => [:language],
|
||||||
|
:a => [:target],
|
||||||
|
:td => [:bgcolor, :nowrap, :width, :height],
|
||||||
|
:p => [:align],
|
||||||
|
:h5 => [:align],
|
||||||
|
:h3 => [:align],
|
||||||
|
:li => [:type, :value],
|
||||||
|
:div => [:align],
|
||||||
|
:pre => [:width],
|
||||||
|
:body => [:background, :bgcolor, :text, :link, :vlink, :alink],
|
||||||
|
:ol => [:type, :compact, :start],
|
||||||
|
:h4 => [:align],
|
||||||
|
:h2 => [:align],
|
||||||
|
:object => [:align, :border, :hspace, :vspace],
|
||||||
|
:img => [:name, :align, :border, :hspace, :vspace],
|
||||||
|
:link => [:target],
|
||||||
|
:legend => [:align],
|
||||||
|
:dl => [:compact],
|
||||||
|
:input => [:align],
|
||||||
|
:h6 => [:align],
|
||||||
|
:hr => [:align, :noshade, :size, :width],
|
||||||
|
:base => [:target],
|
||||||
|
:ul => [:type, :compact],
|
||||||
|
:br => [:clear],
|
||||||
|
:form => [:name, :target],
|
||||||
|
:area => [:target],
|
||||||
|
:h1 => [:align]
|
||||||
|
}.each do |k, v|
|
||||||
|
@tagset[k] += v
|
||||||
|
end
|
||||||
|
|
||||||
|
@tags = @tagset.keys
|
||||||
|
@forms = @tags & FORM_TAGS
|
||||||
|
@self_closing = @tags & SELF_CLOSING_TAGS
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
@ -0,0 +1,821 @@
|
|||||||
|
require 'hpricot/elements'
|
||||||
|
require 'uri'
|
||||||
|
|
||||||
|
module Hpricot
|
||||||
|
module Traverse
|
||||||
|
# Is this object the enclosing HTML or XML document?
|
||||||
|
def doc?() Doc::Trav === self end
|
||||||
|
# Is this object an HTML or XML element?
|
||||||
|
def elem?() Elem::Trav === self end
|
||||||
|
# Is this object an HTML text node?
|
||||||
|
def text?() Text::Trav === self end
|
||||||
|
# Is this object an XML declaration?
|
||||||
|
def xmldecl?() XMLDecl::Trav === self end
|
||||||
|
# Is this object a doctype tag?
|
||||||
|
def doctype?() DocType::Trav === self end
|
||||||
|
# Is this object an XML processing instruction?
|
||||||
|
def procins?() ProcIns::Trav === self end
|
||||||
|
# Is this object a comment?
|
||||||
|
def comment?() Comment::Trav === self end
|
||||||
|
# Is this object a stranded end tag?
|
||||||
|
def bogusetag?() BogusETag::Trav === self end
|
||||||
|
|
||||||
|
# Builds an HTML string from this node and its contents.
|
||||||
|
# If you need to write to a stream, try calling <tt>output(io)</tt>
|
||||||
|
# as a method on this object.
|
||||||
|
def to_html
|
||||||
|
output("")
|
||||||
|
end
|
||||||
|
alias_method :to_s, :to_html
|
||||||
|
|
||||||
|
# Attempts to preserve the original HTML of the document, only
|
||||||
|
# outputing new tags for elements which have changed.
|
||||||
|
def to_original_html
|
||||||
|
output("", :preserve => true)
|
||||||
|
end
|
||||||
|
|
||||||
|
def index(name)
|
||||||
|
i = 0
|
||||||
|
return i if name == "*"
|
||||||
|
children.each do |x|
|
||||||
|
return i if (x.respond_to?(:name) and name == x.name) or
|
||||||
|
(x.text? and name == "text()")
|
||||||
|
i += 1
|
||||||
|
end
|
||||||
|
-1
|
||||||
|
end
|
||||||
|
|
||||||
|
# Puts together an array of neighboring nodes based on their proximity
|
||||||
|
# to this node. So, for example, to get the next node, you could use
|
||||||
|
# <tt>nodes_at(1). Or, to get the previous node, use <tt>nodes_at(1)</tt>.
|
||||||
|
#
|
||||||
|
# This method also accepts ranges and sets of numbers.
|
||||||
|
#
|
||||||
|
# ele.nodes_at(-3..-1, 1..3) # gets three nodes before and three after
|
||||||
|
# ele.nodes_at(1, 5, 7) # gets three nodes at offsets below the current node
|
||||||
|
# ele.nodes_at(0, 5..6) # the current node and two others
|
||||||
|
def nodes_at(*pos)
|
||||||
|
sib = parent.children
|
||||||
|
i, si = 0, sib.index(self)
|
||||||
|
pos.map! do |r|
|
||||||
|
if r.is_a?(Range) and r.begin.is_a?(String)
|
||||||
|
r = Range.new(parent.index(r.begin)-si, parent.index(r.end)-si, r.exclude_end?)
|
||||||
|
end
|
||||||
|
r
|
||||||
|
end
|
||||||
|
p pos
|
||||||
|
Elements[*
|
||||||
|
sib.select do |x|
|
||||||
|
sel =
|
||||||
|
case i - si when *pos
|
||||||
|
true
|
||||||
|
end
|
||||||
|
i += 1
|
||||||
|
sel
|
||||||
|
end
|
||||||
|
]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Returns the node neighboring this node to the south: just below it.
|
||||||
|
# This method includes text nodes and comments and such.
|
||||||
|
def next
|
||||||
|
sib = parent.children
|
||||||
|
sib[sib.index(self) + 1] if parent
|
||||||
|
end
|
||||||
|
alias_method :next_node, :next
|
||||||
|
|
||||||
|
# Returns to node neighboring this node to the north: just above it.
|
||||||
|
# This method includes text nodes and comments and such.
|
||||||
|
def previous
|
||||||
|
sib = parent.children
|
||||||
|
x = sib.index(self) - 1
|
||||||
|
sib[x] if sib and x >= 0
|
||||||
|
end
|
||||||
|
alias_method :previous_node, :previous
|
||||||
|
|
||||||
|
# Find all preceding nodes.
|
||||||
|
def preceding
|
||||||
|
sibs = parent.children
|
||||||
|
si = sibs.index(self)
|
||||||
|
return Elements[*sibs[0...si]]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Find all nodes which follow the current one.
|
||||||
|
def following
|
||||||
|
sibs = parent.children
|
||||||
|
si = sibs.index(self) + 1
|
||||||
|
return Elements[*sibs[si...sibs.length]]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Adds elements immediately after this element, contained in the +html+ string.
|
||||||
|
def after(html = nil, &blk)
|
||||||
|
parent.insert_after(Hpricot.make(html, &blk), self)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Adds elements immediately before this element, contained in the +html+ string.
|
||||||
|
def before(html = nil, &blk)
|
||||||
|
parent.insert_before(Hpricot.make(html, &blk), self)
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
# Replace this element and its contents with the nodes contained
|
||||||
|
# in the +html+ string.
|
||||||
|
def swap(html = nil, &blk)
|
||||||
|
parent.altered!
|
||||||
|
parent.replace_child(self, Hpricot.make(html, &blk))
|
||||||
|
end
|
||||||
|
|
||||||
|
def get_subnode(*indexes)
|
||||||
|
n = self
|
||||||
|
indexes.each {|index|
|
||||||
|
n = n.get_subnode_internal(index)
|
||||||
|
}
|
||||||
|
n
|
||||||
|
end
|
||||||
|
|
||||||
|
# Builds a string from the text contained in this node. All
|
||||||
|
# HTML elements are removed.
|
||||||
|
def to_plain_text
|
||||||
|
if respond_to? :children
|
||||||
|
children.map { |x| x.to_plain_text }.join.strip.gsub(/\n{2,}/, "\n\n")
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# Builds a string from the text contained in this node. All
|
||||||
|
# HTML elements are removed.
|
||||||
|
def inner_text
|
||||||
|
if respond_to? :children
|
||||||
|
children.map { |x| x.inner_text }.join
|
||||||
|
end
|
||||||
|
end
|
||||||
|
alias_method :innerText, :inner_text
|
||||||
|
|
||||||
|
# Builds an HTML string from the contents of this node.
|
||||||
|
def html(inner = nil, &blk)
|
||||||
|
if inner or blk
|
||||||
|
altered!
|
||||||
|
case inner
|
||||||
|
when Array
|
||||||
|
self.children = inner
|
||||||
|
else
|
||||||
|
self.children = Hpricot.make(inner, &blk)
|
||||||
|
end
|
||||||
|
reparent self.children
|
||||||
|
else
|
||||||
|
if respond_to? :children
|
||||||
|
children.map { |x| x.output("") }.join
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
alias_method :inner_html, :html
|
||||||
|
alias_method :innerHTML, :inner_html
|
||||||
|
|
||||||
|
# Inserts new contents into the current node, based on
|
||||||
|
# the HTML contained in string +inner+.
|
||||||
|
def inner_html=(inner)
|
||||||
|
html(inner || [])
|
||||||
|
end
|
||||||
|
alias_method :innerHTML=, :inner_html=
|
||||||
|
|
||||||
|
def reparent(nodes)
|
||||||
|
altered!
|
||||||
|
[*nodes].each { |e| e.parent = self }
|
||||||
|
end
|
||||||
|
private :reparent
|
||||||
|
|
||||||
|
def clean_path(path)
|
||||||
|
path.gsub(/^\s+|\s+$/, '')
|
||||||
|
end
|
||||||
|
|
||||||
|
# Builds a unique XPath string for this node, from the
|
||||||
|
# root of the document containing it.
|
||||||
|
def xpath
|
||||||
|
if elem? and has_attribute? 'id'
|
||||||
|
"//#{self.name}[@id='#{get_attribute('id')}']"
|
||||||
|
else
|
||||||
|
sim, id = 0, 0, 0
|
||||||
|
parent.children.each do |e|
|
||||||
|
id = sim if e == self
|
||||||
|
sim += 1 if e.pathname == self.pathname
|
||||||
|
end
|
||||||
|
p = File.join(parent.xpath, self.pathname)
|
||||||
|
p += "[#{id+1}]" if sim >= 2
|
||||||
|
p
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# Builds a unique CSS string for this node, from the
|
||||||
|
# root of the document containing it.
|
||||||
|
def css_path
|
||||||
|
if elem? and has_attribute? 'id'
|
||||||
|
"##{get_attribute('id')}"
|
||||||
|
else
|
||||||
|
sim, i, id = 0, 0, 0
|
||||||
|
parent.children.each do |e|
|
||||||
|
id = sim if e == self
|
||||||
|
sim += 1 if e.pathname == self.pathname
|
||||||
|
end
|
||||||
|
p = parent.css_path
|
||||||
|
p = p ? "#{p} > #{self.pathname}" : self.pathname
|
||||||
|
p += ":nth(#{id})" if sim >= 2
|
||||||
|
p
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def node_position
|
||||||
|
parent.children.index(self)
|
||||||
|
end
|
||||||
|
|
||||||
|
def position
|
||||||
|
parent.children_of_type(self.pathname).index(self)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Searches this node for all elements matching
|
||||||
|
# the CSS or XPath +expr+. Returns an Elements array
|
||||||
|
# containing the matching nodes. If +blk+ is given, it
|
||||||
|
# is used to iterate through the matching set.
|
||||||
|
def search(expr, &blk)
|
||||||
|
if Range === expr
|
||||||
|
return Elements.expand(at(expr.begin), at(expr.end), expr.exclude_end?)
|
||||||
|
end
|
||||||
|
last = nil
|
||||||
|
nodes = [self]
|
||||||
|
done = []
|
||||||
|
expr = expr.to_s
|
||||||
|
hist = []
|
||||||
|
until expr.empty?
|
||||||
|
expr = clean_path(expr)
|
||||||
|
expr.gsub!(%r!^//!, '')
|
||||||
|
|
||||||
|
case expr
|
||||||
|
when %r!^/?\.\.!
|
||||||
|
last = expr = $'
|
||||||
|
nodes.map! { |node| node.parent }
|
||||||
|
when %r!^[>/]\s*!
|
||||||
|
last = expr = $'
|
||||||
|
nodes = Elements[*nodes.map { |node| node.children if node.respond_to? :children }.flatten.compact]
|
||||||
|
when %r!^\+!
|
||||||
|
last = expr = $'
|
||||||
|
nodes.map! do |node|
|
||||||
|
siblings = node.parent.children
|
||||||
|
siblings[siblings.index(node)+1]
|
||||||
|
end
|
||||||
|
nodes.compact!
|
||||||
|
when %r!^~!
|
||||||
|
last = expr = $'
|
||||||
|
nodes.map! do |node|
|
||||||
|
siblings = node.parent.children
|
||||||
|
siblings[(siblings.index(node)+1)..-1]
|
||||||
|
end
|
||||||
|
nodes.flatten!
|
||||||
|
when %r!^[|,]!
|
||||||
|
last = expr = " #$'"
|
||||||
|
nodes.shift if nodes.first == self
|
||||||
|
done += nodes
|
||||||
|
nodes = [self]
|
||||||
|
else
|
||||||
|
m = expr.match(%r!^([#.]?)([a-z0-9\\*_-]*)!i).to_a
|
||||||
|
after = $'
|
||||||
|
mt = after[%r!:[a-z0-9\\*_-]+!i, 0]
|
||||||
|
oop = false
|
||||||
|
if mt and not (mt == ":not" or Traverse.method_defined? "filter[#{mt}]")
|
||||||
|
after = $'
|
||||||
|
m[2] += mt
|
||||||
|
expr = after
|
||||||
|
end
|
||||||
|
if m[1] == '#'
|
||||||
|
oid = get_element_by_id(m[2])
|
||||||
|
nodes = oid ? [oid] : []
|
||||||
|
expr = after
|
||||||
|
else
|
||||||
|
m[2] = "*" if after =~ /^\(\)/ || m[2] == "" || m[1] == "."
|
||||||
|
ret = []
|
||||||
|
nodes.each do |node|
|
||||||
|
case m[2]
|
||||||
|
when '*'
|
||||||
|
node.traverse_element { |n| ret << n }
|
||||||
|
else
|
||||||
|
if node.respond_to? :get_elements_by_tag_name
|
||||||
|
ret += [*node.get_elements_by_tag_name(m[2])] - [*(node unless last)]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
nodes = ret
|
||||||
|
end
|
||||||
|
last = nil
|
||||||
|
end
|
||||||
|
|
||||||
|
hist << expr
|
||||||
|
break if hist[-1] == hist[-2]
|
||||||
|
nodes, expr = Elements.filter(nodes, expr)
|
||||||
|
end
|
||||||
|
nodes = done + nodes.flatten.uniq
|
||||||
|
if blk
|
||||||
|
nodes.each(&blk)
|
||||||
|
self
|
||||||
|
else
|
||||||
|
Elements[*nodes]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
alias_method :/, :search
|
||||||
|
|
||||||
|
# Find the first matching node for the CSS or XPath
|
||||||
|
# +expr+ string.
|
||||||
|
def at(expr)
|
||||||
|
search(expr).first
|
||||||
|
end
|
||||||
|
alias_method :%, :at
|
||||||
|
|
||||||
|
# +traverse_element+ traverses elements in the tree.
|
||||||
|
# It yields elements in depth first order.
|
||||||
|
#
|
||||||
|
# If _names_ are empty, it yields all elements.
|
||||||
|
# If non-empty _names_ are given, it should be list of universal names.
|
||||||
|
#
|
||||||
|
# A nested element is yielded in depth first order as follows.
|
||||||
|
#
|
||||||
|
# t = Hpricot('<a id=0><b><a id=1 /></b><c id=2 /></a>')
|
||||||
|
# t.traverse_element("a", "c") {|e| p e}
|
||||||
|
# # =>
|
||||||
|
# {elem <a id="0"> {elem <b> {emptyelem <a id="1">} </b>} {emptyelem <c id="2">} </a>}
|
||||||
|
# {emptyelem <a id="1">}
|
||||||
|
# {emptyelem <c id="2">}
|
||||||
|
#
|
||||||
|
# Universal names are specified as follows.
|
||||||
|
#
|
||||||
|
# t = Hpricot(<<'End')
|
||||||
|
# <html>
|
||||||
|
# <meta name="robots" content="index,nofollow">
|
||||||
|
# <meta name="author" content="Who am I?">
|
||||||
|
# </html>
|
||||||
|
# End
|
||||||
|
# t.traverse_element("{http://www.w3.org/1999/xhtml}meta") {|e| p e}
|
||||||
|
# # =>
|
||||||
|
# {emptyelem <{http://www.w3.org/1999/xhtml}meta name="robots" content="index,nofollow">}
|
||||||
|
# {emptyelem <{http://www.w3.org/1999/xhtml}meta name="author" content="Who am I?">}
|
||||||
|
#
|
||||||
|
def traverse_element(*names, &block) # :yields: element
|
||||||
|
if names.empty?
|
||||||
|
traverse_all_element(&block)
|
||||||
|
else
|
||||||
|
name_set = {}
|
||||||
|
names.each {|n| name_set[n] = true }
|
||||||
|
traverse_some_element(name_set, &block)
|
||||||
|
end
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
|
||||||
|
# Find children of a given +tag_name+.
|
||||||
|
#
|
||||||
|
# ele.children_of_type('p')
|
||||||
|
# #=> [...array of paragraphs...]
|
||||||
|
#
|
||||||
|
def children_of_type(tag_name)
|
||||||
|
if respond_to? :children
|
||||||
|
children.find_all do |x|
|
||||||
|
x.respond_to?(:pathname) && x.pathname == tag_name
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
module Container::Trav
|
||||||
|
# Return all children of this node which can contain other
|
||||||
|
# nodes. This is a good way to get all HTML elements which
|
||||||
|
# aren't text, comment, doctype or processing instruction nodes.
|
||||||
|
def containers
|
||||||
|
children.grep(Container::Trav)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Returns the container node neighboring this node to the south: just below it.
|
||||||
|
# By "container" node, I mean: this method does not find text nodes or comments or cdata or any of that.
|
||||||
|
# See Hpricot::Traverse#next_node if you need to hunt out all kinds of nodes.
|
||||||
|
def next_sibling
|
||||||
|
sib = parent.containers
|
||||||
|
sib[sib.index(self) + 1] if parent
|
||||||
|
end
|
||||||
|
|
||||||
|
# Returns the container node neighboring this node to the north: just above it.
|
||||||
|
# By "container" node, I mean: this method does not find text nodes or comments or cdata or any of that.
|
||||||
|
# See Hpricot::Traverse#previous_node if you need to hunt out all kinds of nodes.
|
||||||
|
def previous_sibling
|
||||||
|
sib = parent.containers
|
||||||
|
x = sib.index(self) - 1
|
||||||
|
sib[x] if sib and x >= 0
|
||||||
|
end
|
||||||
|
|
||||||
|
# Find all preceding sibling elements. Like the other "sibling" methods, this weeds
|
||||||
|
# out text and comment nodes.
|
||||||
|
def preceding_siblings()
|
||||||
|
sibs = parent.containers
|
||||||
|
si = sibs.index(self)
|
||||||
|
return Elements[*sibs[0...si]]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Find sibling elements which follow the current one. Like the other "sibling" methods, this weeds
|
||||||
|
# out text and comment nodes.
|
||||||
|
def following_siblings()
|
||||||
|
sibs = parent.containers
|
||||||
|
si = sibs.index(self) + 1
|
||||||
|
return Elements[*sibs[si...sibs.length]]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Puts together an array of neighboring sibling elements based on their proximity
|
||||||
|
# to this element.
|
||||||
|
#
|
||||||
|
# This method accepts ranges and sets of numbers.
|
||||||
|
#
|
||||||
|
# ele.siblings_at(-3..-1, 1..3) # gets three elements before and three after
|
||||||
|
# ele.siblings_at(1, 5, 7) # gets three elements at offsets below the current element
|
||||||
|
# ele.siblings_at(0, 5..6) # the current element and two others
|
||||||
|
#
|
||||||
|
# Like the other "sibling" methods, this doesn't find text and comment nodes.
|
||||||
|
# Use nodes_at to include those nodes.
|
||||||
|
def siblings_at(*pos)
|
||||||
|
sib = parent.containers
|
||||||
|
i, si = 0, sib.index(self)
|
||||||
|
Elements[*
|
||||||
|
sib.select do |x|
|
||||||
|
sel = case i - si when *pos
|
||||||
|
true
|
||||||
|
end
|
||||||
|
i += 1
|
||||||
|
sel
|
||||||
|
end
|
||||||
|
]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Replace +old+, a child of the current node, with +new+ node.
|
||||||
|
def replace_child(old, new)
|
||||||
|
reparent new
|
||||||
|
children[children.index(old), 1] = [*new]
|
||||||
|
end
|
||||||
|
|
||||||
|
# Insert +nodes+, an array of HTML elements or a single element,
|
||||||
|
# before the node +ele+, a child of the current node.
|
||||||
|
def insert_before(nodes, ele)
|
||||||
|
case nodes
|
||||||
|
when Array
|
||||||
|
nodes.each { |n| insert_before(n, ele) }
|
||||||
|
else
|
||||||
|
reparent nodes
|
||||||
|
children[children.index(ele) || 0, 0] = nodes
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# Insert +nodes+, an array of HTML elements or a single element,
|
||||||
|
# after the node +ele+, a child of the current node.
|
||||||
|
def insert_after(nodes, ele)
|
||||||
|
case nodes
|
||||||
|
when Array
|
||||||
|
nodes.reverse_each { |n| insert_after(n, ele) }
|
||||||
|
else
|
||||||
|
reparent nodes
|
||||||
|
idx = children.index(ele)
|
||||||
|
children[idx ? idx + 1 : children.length, 0] = nodes
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# +each_child+ iterates over each child.
|
||||||
|
def each_child(&block) # :yields: child_node
|
||||||
|
children.each(&block)
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
|
||||||
|
# +each_child_with_index+ iterates over each child.
|
||||||
|
def each_child_with_index(&block) # :yields: child_node, index
|
||||||
|
children.each_with_index(&block)
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
|
||||||
|
# +find_element+ searches an element which universal name is specified by
|
||||||
|
# the arguments.
|
||||||
|
# It returns nil if not found.
|
||||||
|
def find_element(*names)
|
||||||
|
traverse_element(*names) {|e| return e }
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
|
||||||
|
# Returns a list of CSS classes to which this element belongs.
|
||||||
|
def classes
|
||||||
|
get_attribute('class').to_s.strip.split(/\s+/)
|
||||||
|
end
|
||||||
|
|
||||||
|
def get_element_by_id(id)
|
||||||
|
traverse_all_element do |ele|
|
||||||
|
if ele.elem? and eid = ele.get_attribute('id')
|
||||||
|
return ele if eid.to_s == id
|
||||||
|
end
|
||||||
|
end
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
|
||||||
|
def get_elements_by_tag_name(*a)
|
||||||
|
list = Elements[]
|
||||||
|
traverse_element(*a.map { |tag| [tag, "{http://www.w3.org/1999/xhtml}#{tag}"] }.flatten) do |e|
|
||||||
|
list << e
|
||||||
|
end
|
||||||
|
list
|
||||||
|
end
|
||||||
|
|
||||||
|
def each_hyperlink_attribute
|
||||||
|
traverse_element(
|
||||||
|
'{http://www.w3.org/1999/xhtml}a',
|
||||||
|
'{http://www.w3.org/1999/xhtml}area',
|
||||||
|
'{http://www.w3.org/1999/xhtml}link',
|
||||||
|
'{http://www.w3.org/1999/xhtml}img',
|
||||||
|
'{http://www.w3.org/1999/xhtml}object',
|
||||||
|
'{http://www.w3.org/1999/xhtml}q',
|
||||||
|
'{http://www.w3.org/1999/xhtml}blockquote',
|
||||||
|
'{http://www.w3.org/1999/xhtml}ins',
|
||||||
|
'{http://www.w3.org/1999/xhtml}del',
|
||||||
|
'{http://www.w3.org/1999/xhtml}form',
|
||||||
|
'{http://www.w3.org/1999/xhtml}input',
|
||||||
|
'{http://www.w3.org/1999/xhtml}head',
|
||||||
|
'{http://www.w3.org/1999/xhtml}base',
|
||||||
|
'{http://www.w3.org/1999/xhtml}script') {|elem|
|
||||||
|
case elem.name
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:base|a|area|link)\z}i
|
||||||
|
attrs = ['href']
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:img)\z}i
|
||||||
|
attrs = ['src', 'longdesc', 'usemap']
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:object)\z}i
|
||||||
|
attrs = ['classid', 'codebase', 'data', 'usemap']
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:q|blockquote|ins|del)\z}i
|
||||||
|
attrs = ['cite']
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:form)\z}i
|
||||||
|
attrs = ['action']
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:input)\z}i
|
||||||
|
attrs = ['src', 'usemap']
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:head)\z}i
|
||||||
|
attrs = ['profile']
|
||||||
|
when %r{\{http://www.w3.org/1999/xhtml\}(?:script)\z}i
|
||||||
|
attrs = ['src', 'for']
|
||||||
|
end
|
||||||
|
attrs.each {|attr|
|
||||||
|
if hyperlink = elem.get_attribute(attr)
|
||||||
|
yield elem, attr, hyperlink
|
||||||
|
end
|
||||||
|
}
|
||||||
|
}
|
||||||
|
end
|
||||||
|
private :each_hyperlink_attribute
|
||||||
|
|
||||||
|
# +each_hyperlink_uri+ traverses hyperlinks such as HTML href attribute
|
||||||
|
# of A element.
|
||||||
|
#
|
||||||
|
# It yields Hpricot::Text and URI for each hyperlink.
|
||||||
|
#
|
||||||
|
# The URI objects are created with a base URI which is given by
|
||||||
|
# HTML BASE element or the argument ((|base_uri|)).
|
||||||
|
# +each_hyperlink_uri+ doesn't yields href of the BASE element.
|
||||||
|
def each_hyperlink_uri(base_uri=nil) # :yields: hyperlink, uri
|
||||||
|
base_uri = URI.parse(base_uri) if String === base_uri
|
||||||
|
links = []
|
||||||
|
each_hyperlink_attribute {|elem, attr, hyperlink|
|
||||||
|
if %r{\{http://www.w3.org/1999/xhtml\}(?:base)\z}i =~ elem.name
|
||||||
|
base_uri = URI.parse(hyperlink.to_s)
|
||||||
|
else
|
||||||
|
links << hyperlink
|
||||||
|
end
|
||||||
|
}
|
||||||
|
if base_uri
|
||||||
|
links.each {|hyperlink| yield hyperlink, base_uri + hyperlink.to_s }
|
||||||
|
else
|
||||||
|
links.each {|hyperlink| yield hyperlink, URI.parse(hyperlink.to_s) }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# +each_hyperlink+ traverses hyperlinks such as HTML href attribute
|
||||||
|
# of A element.
|
||||||
|
#
|
||||||
|
# It yields Hpricot::Text.
|
||||||
|
#
|
||||||
|
# Note that +each_hyperlink+ yields HTML href attribute of BASE element.
|
||||||
|
def each_hyperlink # :yields: text
|
||||||
|
links = []
|
||||||
|
each_hyperlink_attribute {|elem, attr, hyperlink|
|
||||||
|
yield hyperlink
|
||||||
|
}
|
||||||
|
end
|
||||||
|
|
||||||
|
# +each_uri+ traverses hyperlinks such as HTML href attribute
|
||||||
|
# of A element.
|
||||||
|
#
|
||||||
|
# It yields URI for each hyperlink.
|
||||||
|
#
|
||||||
|
# The URI objects are created with a base URI which is given by
|
||||||
|
# HTML BASE element or the argument ((|base_uri|)).
|
||||||
|
def each_uri(base_uri=nil) # :yields: URI
|
||||||
|
each_hyperlink_uri(base_uri) {|hyperlink, uri| yield uri }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# :stopdoc:
|
||||||
|
module Doc::Trav
|
||||||
|
def traverse_all_element(&block)
|
||||||
|
children.each {|c| c.traverse_all_element(&block) }
|
||||||
|
end
|
||||||
|
def xpath
|
||||||
|
"/"
|
||||||
|
end
|
||||||
|
def css_path
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Elem::Trav
|
||||||
|
def traverse_all_element(&block)
|
||||||
|
yield self
|
||||||
|
children.each {|c| c.traverse_all_element(&block) }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Leaf::Trav
|
||||||
|
def traverse_all_element
|
||||||
|
yield self
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Doc::Trav
|
||||||
|
def traverse_some_element(name_set, &block)
|
||||||
|
children.each {|c| c.traverse_some_element(name_set, &block) }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Elem::Trav
|
||||||
|
def traverse_some_element(name_set, &block)
|
||||||
|
yield self if name_set.include? self.name
|
||||||
|
children.each {|c| c.traverse_some_element(name_set, &block) }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Leaf::Trav
|
||||||
|
def traverse_some_element(name_set)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
# :startdoc:
|
||||||
|
|
||||||
|
module Traverse
|
||||||
|
# +traverse_text+ traverses texts in the tree
|
||||||
|
def traverse_text(&block) # :yields: text
|
||||||
|
traverse_text_internal(&block)
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# :stopdoc:
|
||||||
|
module Container::Trav
|
||||||
|
def traverse_text_internal(&block)
|
||||||
|
each_child {|c| c.traverse_text_internal(&block) }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Leaf::Trav
|
||||||
|
def traverse_text_internal
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Text::Trav
|
||||||
|
def traverse_text_internal
|
||||||
|
yield self
|
||||||
|
end
|
||||||
|
end
|
||||||
|
# :startdoc:
|
||||||
|
|
||||||
|
module Container::Trav
|
||||||
|
# +filter+ rebuilds the tree without some components.
|
||||||
|
#
|
||||||
|
# node.filter {|descendant_node| predicate } -> node
|
||||||
|
# loc.filter {|descendant_loc| predicate } -> node
|
||||||
|
#
|
||||||
|
# +filter+ yields each node except top node.
|
||||||
|
# If given block returns false, corresponding node is dropped.
|
||||||
|
# If given block returns true, corresponding node is retained and
|
||||||
|
# inner nodes are examined.
|
||||||
|
#
|
||||||
|
# +filter+ returns an node.
|
||||||
|
# It doesn't return location object even if self is location object.
|
||||||
|
#
|
||||||
|
def filter(&block)
|
||||||
|
subst = {}
|
||||||
|
each_child_with_index {|descendant, i|
|
||||||
|
if yield descendant
|
||||||
|
if descendant.elem?
|
||||||
|
subst[i] = descendant.filter(&block)
|
||||||
|
else
|
||||||
|
subst[i] = descendant
|
||||||
|
end
|
||||||
|
else
|
||||||
|
subst[i] = nil
|
||||||
|
end
|
||||||
|
}
|
||||||
|
to_node.subst_subnode(subst)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Doc::Trav
|
||||||
|
# +title+ searches title and return it as a text.
|
||||||
|
# It returns nil if not found.
|
||||||
|
#
|
||||||
|
# +title+ searchs following information.
|
||||||
|
#
|
||||||
|
# - <title>...</title> in HTML
|
||||||
|
# - <title>...</title> in RSS
|
||||||
|
def title
|
||||||
|
e = find_element('title',
|
||||||
|
'{http://www.w3.org/1999/xhtml}title',
|
||||||
|
'{http://purl.org/rss/1.0/}title',
|
||||||
|
'{http://my.netscape.com/rdf/simple/0.9/}title')
|
||||||
|
e && e.extract_text
|
||||||
|
end
|
||||||
|
|
||||||
|
# +author+ searches author and return it as a text.
|
||||||
|
# It returns nil if not found.
|
||||||
|
#
|
||||||
|
# +author+ searchs following information.
|
||||||
|
#
|
||||||
|
# - <meta name="author" content="author-name"> in HTML
|
||||||
|
# - <link rev="made" title="author-name"> in HTML
|
||||||
|
# - <dc:creator>author-name</dc:creator> in RSS
|
||||||
|
# - <dc:publisher>author-name</dc:publisher> in RSS
|
||||||
|
def author
|
||||||
|
traverse_element('meta',
|
||||||
|
'{http://www.w3.org/1999/xhtml}meta') {|e|
|
||||||
|
begin
|
||||||
|
next unless e.fetch_attr('name').downcase == 'author'
|
||||||
|
author = e.fetch_attribute('content').strip
|
||||||
|
return author if !author.empty?
|
||||||
|
rescue IndexError
|
||||||
|
end
|
||||||
|
}
|
||||||
|
|
||||||
|
traverse_element('link',
|
||||||
|
'{http://www.w3.org/1999/xhtml}link') {|e|
|
||||||
|
begin
|
||||||
|
next unless e.fetch_attr('rev').downcase == 'made'
|
||||||
|
author = e.fetch_attribute('title').strip
|
||||||
|
return author if !author.empty?
|
||||||
|
rescue IndexError
|
||||||
|
end
|
||||||
|
}
|
||||||
|
|
||||||
|
if channel = find_element('{http://purl.org/rss/1.0/}channel')
|
||||||
|
channel.traverse_element('{http://purl.org/dc/elements/1.1/}creator') {|e|
|
||||||
|
begin
|
||||||
|
author = e.extract_text.strip
|
||||||
|
return author if !author.empty?
|
||||||
|
rescue IndexError
|
||||||
|
end
|
||||||
|
}
|
||||||
|
channel.traverse_element('{http://purl.org/dc/elements/1.1/}publisher') {|e|
|
||||||
|
begin
|
||||||
|
author = e.extract_text.strip
|
||||||
|
return author if !author.empty?
|
||||||
|
rescue IndexError
|
||||||
|
end
|
||||||
|
}
|
||||||
|
end
|
||||||
|
|
||||||
|
nil
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
module Doc::Trav
|
||||||
|
def root
|
||||||
|
es = []
|
||||||
|
children.each {|c| es << c if c.elem? }
|
||||||
|
raise Hpricot::Error, "no element" if es.empty?
|
||||||
|
raise Hpricot::Error, "multiple top elements" if 1 < es.length
|
||||||
|
es[0]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
module Elem::Trav
|
||||||
|
def has_attribute?(name)
|
||||||
|
self.raw_attributes && self.raw_attributes.has_key?(name.to_s)
|
||||||
|
end
|
||||||
|
def get_attribute(name)
|
||||||
|
a = self.raw_attributes && self.raw_attributes[name.to_s]
|
||||||
|
a = Hpricot.uxs(a) if a
|
||||||
|
a
|
||||||
|
end
|
||||||
|
alias_method :[], :get_attribute
|
||||||
|
def set_attribute(name, val)
|
||||||
|
altered!
|
||||||
|
self.raw_attributes ||= {}
|
||||||
|
self.raw_attributes[name.to_s] = Hpricot.xs(val)
|
||||||
|
end
|
||||||
|
alias_method :[]=, :set_attribute
|
||||||
|
def remove_attribute(name)
|
||||||
|
name = name.to_s
|
||||||
|
if has_attribute? name
|
||||||
|
altered!
|
||||||
|
self.raw_attributes.delete(name)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
@ -0,0 +1,94 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
# The XChar library is provided courtesy of Sam Ruby (See
|
||||||
|
# http://intertwingly.net/stories/2005/09/28/xchar.rb)
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------
|
||||||
|
|
||||||
|
######################################################################
|
||||||
|
module Hpricot
|
||||||
|
|
||||||
|
####################################################################
|
||||||
|
# XML Character converter, from Sam Ruby:
|
||||||
|
# (see http://intertwingly.net/stories/2005/09/28/xchar.rb).
|
||||||
|
#
|
||||||
|
module XChar # :nodoc:
|
||||||
|
|
||||||
|
# See
|
||||||
|
# http://intertwingly.net/stories/2004/04/14/i18n.html#CleaningWindows
|
||||||
|
# for details.
|
||||||
|
CP1252 = { # :nodoc:
|
||||||
|
128 => 8364, # euro sign
|
||||||
|
130 => 8218, # single low-9 quotation mark
|
||||||
|
131 => 402, # latin small letter f with hook
|
||||||
|
132 => 8222, # double low-9 quotation mark
|
||||||
|
133 => 8230, # horizontal ellipsis
|
||||||
|
134 => 8224, # dagger
|
||||||
|
135 => 8225, # double dagger
|
||||||
|
136 => 710, # modifier letter circumflex accent
|
||||||
|
137 => 8240, # per mille sign
|
||||||
|
138 => 352, # latin capital letter s with caron
|
||||||
|
139 => 8249, # single left-pointing angle quotation mark
|
||||||
|
140 => 338, # latin capital ligature oe
|
||||||
|
142 => 381, # latin capital letter z with caron
|
||||||
|
145 => 8216, # left single quotation mark
|
||||||
|
146 => 8217, # right single quotation mark
|
||||||
|
147 => 8220, # left double quotation mark
|
||||||
|
148 => 8221, # right double quotation mark
|
||||||
|
149 => 8226, # bullet
|
||||||
|
150 => 8211, # en dash
|
||||||
|
151 => 8212, # em dash
|
||||||
|
152 => 732, # small tilde
|
||||||
|
153 => 8482, # trade mark sign
|
||||||
|
154 => 353, # latin small letter s with caron
|
||||||
|
155 => 8250, # single right-pointing angle quotation mark
|
||||||
|
156 => 339, # latin small ligature oe
|
||||||
|
158 => 382, # latin small letter z with caron
|
||||||
|
159 => 376, # latin capital letter y with diaeresis
|
||||||
|
}
|
||||||
|
|
||||||
|
# See http://www.w3.org/TR/REC-xml/#dt-chardata for details.
|
||||||
|
PREDEFINED = {
|
||||||
|
34 => '"', # quotation mark
|
||||||
|
38 => '&', # ampersand
|
||||||
|
60 => '<', # left angle bracket
|
||||||
|
62 => '>' # right angle bracket
|
||||||
|
}
|
||||||
|
PREDEFINED_U = PREDEFINED.inject({}) { |hsh, (k, v)| hsh[v] = k; hsh }
|
||||||
|
|
||||||
|
# See http://www.w3.org/TR/REC-xml/#charsets for details.
|
||||||
|
VALID = [
|
||||||
|
0x9, 0xA, 0xD,
|
||||||
|
(0x20..0xD7FF),
|
||||||
|
(0xE000..0xFFFD),
|
||||||
|
(0x10000..0x10FFFF)
|
||||||
|
]
|
||||||
|
end
|
||||||
|
|
||||||
|
class << self
|
||||||
|
# XML escaped version of chr
|
||||||
|
def xchr(str)
|
||||||
|
n = XChar::CP1252[str] || str
|
||||||
|
case n when *XChar::VALID
|
||||||
|
XChar::PREDEFINED[n] or (n<128 ? n.chr : "&##{n};")
|
||||||
|
else
|
||||||
|
'*'
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# XML escaped version of to_s
|
||||||
|
def xs(str)
|
||||||
|
str.to_s.unpack('U*').map {|n| xchr(n)}.join # ASCII, UTF-8
|
||||||
|
rescue
|
||||||
|
str.to_s.unpack('C*').map {|n| xchr(n)}.join # ISO-8859-1, WIN-1252
|
||||||
|
end
|
||||||
|
|
||||||
|
# XML unescape
|
||||||
|
def uxs(str)
|
||||||
|
str.to_s.
|
||||||
|
gsub(/\&\w+;/) { |x| (XChar::PREDEFINED_U[x] || ??).chr }.
|
||||||
|
gsub(/\&\#(\d+);/) { [$1.to_i].pack("U*") }
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
@ -0,0 +1,17 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
||||||
|
<head>
|
||||||
|
<title>Sample XHTML</title>
|
||||||
|
<link rel='stylesheet' href='test1.css' />
|
||||||
|
<link rel='stylesheet' href='test2.css' />
|
||||||
|
<link rel='stylesheet' href='test3.css' />
|
||||||
|
</head>
|
||||||
|
<body id='body1'>
|
||||||
|
<p>Sample XHTML for <a id="link1" href="http://code.whytheluckystiff.net/mouseHole/">MouseHole 2</a>.</p>
|
||||||
|
<p class='ohmy'>Please filter <a id="link2" href="http://hobix.com/">me</a>!</p>
|
||||||
|
<p>The third paragraph</p>
|
||||||
|
<p class="last final"><b>THE FINAL PARAGRAPH</b></p>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,16 @@
|
|||||||
|
<html>
|
||||||
|
<HEAD>
|
||||||
|
<meta http-equiv="Refresh" content="0; url=http://tenderlovemaking.com">
|
||||||
|
<META http-equiv="Refresh" content="0; url=http://tenderlovemaking.com">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<a href ="http://tenderlovemaking.com/">My Site!</a>
|
||||||
|
<A href ="http://whytheluckystiff.net/">Your Site!</A>
|
||||||
|
<MAP>
|
||||||
|
<area HREF="http://whytheluckystiff.net/" COORDS="1,2,3,4"></area>
|
||||||
|
<AREA HREF="http://tenderlovemaking.com/" COORDS="1,2,3,4">
|
||||||
|
</area>
|
||||||
|
<AREA HREF="http://tenderlovemaking.com/" COORDS="5,5,10,10" />
|
||||||
|
</MAP>
|
||||||
|
</body>
|
||||||
|
</html>
|
@ -0,0 +1,220 @@
|
|||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||||
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<title>Free Genealogy and Family History Online - The USGenWeb Project</title>
|
||||||
|
<meta name="keywords" content="free genealogy search" />
|
||||||
|
<meta name="description" content="Free genealogy and family history online made possible by the USGenWeb Project volunteers. Search free genealogy websites for your ancestors." />
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
|
||||||
|
<link rel="stylesheet" type="text/css" href="usgw-layout.css" />
|
||||||
|
<link rel="stylesheet" type="text/css" href="usgw.css" />
|
||||||
|
<style type="text/css">
|
||||||
|
<!--
|
||||||
|
.pullquote {
|
||||||
|
font-family: Verdana, Arial, Helvetica, sans-serif;
|
||||||
|
font-size: 12px;
|
||||||
|
float: right;
|
||||||
|
width: 185px;
|
||||||
|
margin-top: 10px;
|
||||||
|
margin-bottom: 2px;
|
||||||
|
border-top-width: 10px;
|
||||||
|
border-bottom-width: 3px;
|
||||||
|
border-top-style: solid;
|
||||||
|
border-bottom-style: solid;
|
||||||
|
border-top-color: #38386E;
|
||||||
|
border-right-color: #38386E;
|
||||||
|
border-bottom-color: #38386E;
|
||||||
|
border-left-color: #38386E;
|
||||||
|
font-style: italic;
|
||||||
|
font-weight: normal;
|
||||||
|
border-right-width: 1px;
|
||||||
|
border-left-width: 1px;
|
||||||
|
border-right-style: solid;
|
||||||
|
border-left-style: solid;
|
||||||
|
padding: 3px;
|
||||||
|
}
|
||||||
|
.style2 {color: #003366}
|
||||||
|
-->
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<!-- HEADER DIV -->
|
||||||
|
<div id="hdr">
|
||||||
|
<div align="center"><img alt="The USGenWeb Project, Free Genealogy Online" src="images/widelogo.jpg" width="740" height="150" /></div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
<!-- HEADER LINKS -->
|
||||||
|
<div id="hdr2">
|
||||||
|
|
||||||
|
<div align="center"><img src="images/navbar.gif" width="740" height="30" usemap="#Map" border="0" />
|
||||||
|
<map name="Map">
|
||||||
|
<area shape="rect" coords="46,1,126,28" href="index.shtml" alt="Home">
|
||||||
|
<area shape="rect" coords="134,1,223,28" href="about/index.shtml" alt="About Us">
|
||||||
|
<area shape="rect" coords="239,1,320,30" href="states/index.shtml" alt="States">
|
||||||
|
<area shape="rect" coords="332,1,424,28" href="projects/index.shtml" alt="Projects">
|
||||||
|
<area shape="rect" coords="444,2,555,28" href="research/index.shtml" alt="Researchers">
|
||||||
|
<area shape="rect" coords="575,0,686,28" href="volunteers/index.shtml" alt="Volunteers">
|
||||||
|
</map>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
<!-- CENTER COLUMN -->
|
||||||
|
<div id="c-block">
|
||||||
|
<div id="c-col">
|
||||||
|
<p> </p>
|
||||||
|
<h3 align="center">Keeping Internet Genealogy Free<br />
|
||||||
|
<br />
|
||||||
|
</h3>
|
||||||
|
<div align="left">
|
||||||
|
<div>
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<td><div class="pullquote">
|
||||||
|
<p align="center"><span class="style2"><a href="states/counties.shtml">Counties of the Month</a></span><br />
|
||||||
|
<a href="http://www.rootsweb.com/~inmontgo/">Montgomery County, IN</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~flalachu/">Alachua County, FL</a><br />
|
||||||
|
<br />
|
||||||
|
<span class="style2"><a href="volunteers/FGS.shtml">Upcoming Events</a></span><br />
|
||||||
|
FGS Conference 2006<br />
|
||||||
|
<br />
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<p><img src="photos/Gena-Farnham-Wallace.jpg" width="150" height="205" align="left" />
|
||||||
|
<p>Welcome to The USGenWeb Project! We are a group of volunteers working together to provide free genealogy websites for genealogical research in every county and every state of the United States. This Project is non-commercial and fully committed to free genealogy access for everyone.</p>
|
||||||
|
<p>Organization is by county and state, and this website provides you with links to all the state genealogy websites which, in turn, provide gateways to the counties. The USGenWeb Project also sponsors important Special Projects at the national level and this website provides an entry point to all of those pages, as well.</p>
|
||||||
|
<p>Clicking on a State Link (on the left) will take you to the State's website. Clicking on the tabs above will take you to additional information and links. </p>
|
||||||
|
<p>All of the volunteers who make up The USGenWeb Project are very proud of this endeavor and hope that you will find their hard work both beneficial and rewarding. Thank you for visiting!</p>
|
||||||
|
<p>The USGenWeb Project Team
|
||||||
|
</p>
|
||||||
|
<h3 align="center">10th Anniversary<br /> <br />
|
||||||
|
</h3>
|
||||||
|
<div align="left">
|
||||||
|
<p><img src="photos/oldphoto1.jpg" width="175" height="200" align="right" />2006 marks the 10th Anniversary of the USGenWeb Project and I have been looking back over those past 10 years. When the USGenWeb Project began, it was one of the few (if not the only) centralized places on the internet to find genealogy information and post a query. Those early state and county sites began with links to the small amount of on-line information of interest to a family historian and a query page. The only Special Project was the Archives. How far the Project has come during the past 10 years! Now there are several special projects and the states, counties and special projects sites of the Project not only contain links; they are filled with information and transcribed records, and more is being added every day by our wonderful, dedicated and hard working volunteers.</p>
|
||||||
|
<p>Ten years ago the internet, as we know it today, was in its infancy. The things we take for granted today--e-mail, PCs, cell phones, digital cameras, etc., were not in the average person's world. Family historians and professional genealogists not only didn't use the internet, most had never heard of it.</p>
|
||||||
|
<p>Over the past 10 years the internet has gone from obscurity to commonplace. As the internet became an every day tool for millions of people. it changed the way family historians do research. The availability of on-line, easily accessible genealogy and historical information has fueled the phenomenal growth of Genealogy as a hobby and, I'm proud to say, the Project has been right there every step of the way. </p>
|
||||||
|
<p>Everywhere we look we see genealogy reported as the fastest growing hobby in the country. Now the internet is the first stop for beginning family historians and is used extensively by experienced researchers. New "How To" genealogy books devote chapters to using the internet, and it is a rare book that does not recommend The USGenWeb Project as one of the first places to visit.</p>
|
||||||
|
<p>While subscription sites have popped up everywhere on the web, The Project has continued to offer free access to its vast wealth of information. The USGenWeb Project is recognized as the premier site of free information, and the Project's websites welcome well over a million visitors each day.</p>
|
||||||
|
<p>The Project is where it is today because of the thousands of volunteers, both past and present, who cared enough to devote, collectively, millions of hours to gathering, transcribing and uploading information. </p>
|
||||||
|
<p>To each and every volunteer, past and present, a heartfelt Thank You, because you are ones who have made The Project the fabulous resource it is today.</p>
|
||||||
|
<p>Linda Haas Davenport<br />
|
||||||
|
National Coordinator<br />
|
||||||
|
The USGenWeb Project</p>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<p></p></td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<br />
|
||||||
|
</div>
|
||||||
|
<!-- END CENTER COLUMN --></div>
|
||||||
|
<!-- END C-BLOCK -->
|
||||||
|
<div id="ftr">
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<div align="center"><img src="images/footer-bar.gif" width="740" height="30" usemap="#footerMap" border="0" /></div>
|
||||||
|
<map name="footerMap">
|
||||||
|
<area shape="rect" coords="430,6,565,25" href="http://www.usgenweb.org">
|
||||||
|
</map>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<!-- LEFT COLUMN -->
|
||||||
|
<div id="lh-col">
|
||||||
|
<span style="margin:10px 10px 10px 10px;"><br />
|
||||||
|
<a href="http://www.rootsweb.com/~algenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Alabama Genealogy">Alabama</a><br />
|
||||||
|
<a href="http://www.akgenweb.org" rel="nofollow" class="sidenavLnk" target=_blank" title="Alaska Genealogy">Alaska</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~azgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Arizona Genealogy">Arizona</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~argenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Arkansas Genealogy">Arkansas</a><br />
|
||||||
|
<a href="http://cagenweb.com/" rel="nofollow" class="sidenavLnk" target=_blank" title="California Genealogy">California</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~cogenweb/comain.htm" rel="nofollow" class="sidenavLnk" target=_blank" title="Colorado Genealogy">Colorado</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~ctgenweb" rel="nofollow" class="sidenavLnk" target=_blank" title="Connecticut Genealogy">Connecticut</a><br />
|
||||||
|
<a href="http://www.degenweb.org/" rel="nofollow" class="sidenavLnk" target=_blank" title="Delaware Genealogy">Delaware</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~dcgenweb/dc_genweb.htm" rel="nofollow" class="sidenavLnk" target=_blank" title="District of Columbia Genealogy">District of Columbia</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~flgenweb/index.html" rel="nofollow" class="sidenavLnk" target=_blank" title="Florida Genealogy">Florida</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~gagenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Georgia Genealogy">Georgia</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~higenweb/hawaii.htm" rel="nofollow" class="sidenavLnk" target=_blank" title="Hawaii Genealogy">Hawaii</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~idgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Idaho Genealogy">Idaho</a><br />
|
||||||
|
<a href="http://ilgenweb.rootsweb.com/" rel="nofollow" class="sidenavLnk" target=_blank" title="Illinois Genealogy">Illinois</a><br />
|
||||||
|
<a href="http://www.ingenweb.org" rel="nofollow" class="sidenavLnk" target=_blank" title="Indiana Genealogy">Indiana</a><br />
|
||||||
|
<a href="http://IAGenWeb.org" rel="nofollow" class="sidenavLnk" target=_blank" title="Iowa Genealogy">Iowa</a><br />
|
||||||
|
<a href="http://skyways.lib.ks.us/genweb/index.html" rel="nofollow" class="sidenavLnk" target=_blank" title="Kansas Genealogy">Kansas</a><br />
|
||||||
|
<a href="http://www.kygenweb.net/index.html" rel="nofollow" class="sidenavLnk" target=_blank" title="Kentucky Genealogy">Kentucky</a><br />
|
||||||
|
<a href="http://www.lagenweb.org/" rel="nofollow" class="sidenavLnk" target=_blank" title="Louisiana Genealogy">Louisiana</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~megenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Maine Genealogy">Maine</a><br />
|
||||||
|
<a href="http://www.mdgenweb.org" rel="nofollow" class="sidenavLnk" target=_blank" title="Maryland Genealogy">Maryland</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~magenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Massachusetts Genealogy">Massachusetts</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~migenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Michigan Genealogy">Michigan</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~mngenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Minnesota Genealogy">Minnesota</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~msgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Mississippi Genealogy">Mississippi</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~mogenweb/mo.htm" rel="nofollow" class="sidenavLnk" target=_blank" title="Missouri Genealogy">Missouri</a><br />
|
||||||
|
<a href="http://rootsweb.com/~mtgenweb" rel="nofollow" class="sidenavLnk" target=_blank" title="Montana Genealogy">Montana</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~negenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Nebraska Genealogy">Nebraska</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~nvgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Nevada Genealogy">Nevada</a><br />
|
||||||
|
<a href="http://www.usroots.com/~usgwnhus/" rel="nofollow" class="sidenavLnk" target=_blank" title="New Hampshire Genealogy">New Hampshire</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~njgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="New Jersey Genealogy">New Jersey</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~nmgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="New Mexico Genealogy">New Mexico</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~nygenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="New York Genealogy">New York</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~ncgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="North Carolina Genealogy">North Carolina</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~ndgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="North Dakota Genealogy">North Dakota</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~ohgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Ohio Genealogy">Ohio</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~okgenweb/index.htm" rel="nofollow" class="sidenavLnk" target=_blank" title="Oklahoma Genealogy">Oklahoma</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~itgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Oklahoma-Indian Territory Genealogy">Oklahoma/Indian Territory</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~orgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Oregon Genealogy">Oregon</a><br />
|
||||||
|
<a href="http://www.pagenweb.org/" rel="nofollow" class="sidenavLnk" target=_blank" title="Pennsylvania Genealogy">Pennsylvania</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~rigenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Rhode Island Genealogy">Rhode Island</a><br />
|
||||||
|
<a href="http://sciway3.net/scgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="South Carolina Genealogy">South Carolina</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~sdgenweb" rel="nofollow" class="sidenavLnk" target=_blank" title="South Dakota Genealogy">South Dakota</a><br />
|
||||||
|
<a href="http://www.tngenweb.org/" rel="nofollow" class="sidenavLnk" target=_blank" title="Tennessee Genealogy">Tennessee</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~txgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Texas Genealogy">Texas</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~utgenweb/index.html" rel="nofollow" class="sidenavLnk" target=_blank" title="Utah Genealogy">Utah</a><br />
|
||||||
|
<a href="http://home.att.net/~Local_History/VT_History.htm" rel="nofollow" class="sidenavLnk" target=_blank" title="Vermont Genealogy">Vermont</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~vagenweb" rel="nofollow" class="sidenavLnk" target=_blank" title="Virginia Genealogy">Virginia</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~wagenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Washington Genealogy">Washington</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~wvgenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="West Virginia Genealogy">West Virginia</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~wigenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Wisconsin Genealogy">Wisconsin</a><br />
|
||||||
|
<a href="http://www.rootsweb.com/~wygenweb/" rel="nofollow" class="sidenavLnk" target=_blank" title="Wyoming Genealogy">Wyoming</a>
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<!-- END LEFT COLUMN -->
|
||||||
|
<!-- RIGHT COLUMN -->
|
||||||
|
<div id="rh-col">
|
||||||
|
<br />
|
||||||
|
<span style="margin: 10px 0px 6px 6px;">
|
||||||
|
<div align="center">
|
||||||
|
<p><img alt="The USGenWeb Project, Free Genealogy Online" src="images/usgenweb100x104.gif" width="100" height="104" /></p></div></span>
|
||||||
|
<span style="margin: 10px 0px 6px 6px;">
|
||||||
|
<div align="left">
|
||||||
|
<!-- <h4>Search Engines</h4> -->
|
||||||
|
<p><a href="../states/counties.shtml" rel="nofollow" class="sidenavLnk">County Spotlight</a><br />
|
||||||
|
|
||||||
|
<p><a href="http://www.rootsweb.com/~usgenweb/newsearch.htm" rel="nofollow" class="sidenavLnk" target="_blank"> Project Archives</a><br />
|
||||||
|
</div>
|
||||||
|
<div align="center">
|
||||||
|
<hr width="75%" size="1" noshade />
|
||||||
|
</div>
|
||||||
|
<div align="left">
|
||||||
|
<p align="left" class="sidenav">Comments and administrative-type problems should be emailed to the <a href="mailto:lhaasdav@cox.net" class="link">National Coordinator</a>.
|
||||||
|
For complaints regarding a specific web site within the USGenWeb Project, please include the URL when emailing the National Coordinator.</p>
|
||||||
|
<p align="left" class="sidenav">Direct comments or suggestions about this web site to the <a href="mailto:webmaster@usgenweb.com" class="link">Webmaster</a>. </p>
|
||||||
|
<br />
|
||||||
|
<p align="center"><a href="http://www.rootsweb.com" rel="nofollow"><img src="images/rootsweb-blue-68x85.gif" width="68" height="85" border="0" alt="Visit Rootsweb"></a></p>
|
||||||
|
</div>
|
||||||
|
<p>
|
||||||
|
<a href="index.shtml" class="sidenavLnk" title="The USGenWeb Project">Home</a><br />
|
||||||
|
<a href="about/index.shtml" class="sidenavLnk" title="About The USGenWeb Project">About Us</a><br />
|
||||||
|
<a href="projects/index.shtml" class="sidenavLnk" title="Genealogy Projects">Projects</a><br />
|
||||||
|
<a href="research/index.shtml" class="sidenavLnk" title="Help for Genealogy Research">for Researchers</a><br />
|
||||||
|
<a href="volunteers/index.shtml" class="sidenavLnk" title="USGenWeb Volunteers">for Volunteers</a><br />
|
||||||
|
<a href="sitemap.shtml" class="sidenavLnk">Site Map</a></p>
|
||||||
|
</span>
|
||||||
|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
<!-- END RIGHT COLUMN -->
|
||||||
|
</body>
|
||||||
|
</html>
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,19 @@
|
|||||||
|
<?xml version='1.0'?><rss xmlns:admin='http://webns.net/mvcb/' version='2.0' xmlns:sy='http://purl.org/rss/1.0/modules/syndication/' xmlns:dc='http://purl.org/dc/elements/1.1/' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
|
||||||
|
<channel>
|
||||||
|
<title>why the lucky stiff</title>
|
||||||
|
<link>http://whytheluckystiff.net</link>
|
||||||
|
<description>hex-editing reality to give us infinite grenades!!</description>
|
||||||
|
<dc:language>en-us</dc:language>
|
||||||
|
<dc:creator/>
|
||||||
|
<dc:date>2007-01-16T22:39:04+00:00</dc:date>
|
||||||
|
<admin:generatorAgent rdf:resource='http://hobix.com/?v=0.4'/>
|
||||||
|
<sy:updatePeriod>hourly</sy:updatePeriod>
|
||||||
|
<sy:updateFrequency>1</sy:updateFrequency>
|
||||||
|
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
|
||||||
|
<item><title>1.3</title><link>http://whytheluckystiff.net/quatrains/1.3.html</link><guid isPermaLink='false'>quatrains/1.3@http://whytheluckystiff.net</guid><dc:subject>quatrains</dc:subject><dc:subject>quatrains</dc:subject><dc:creator>why the lucky stiff</dc:creator><dc:date>2007-01-14T08:47:05+00:00</dc:date><description><blockquote>
|
||||||
|
<p>That cadillac of yours and that driver of yours!<br />You and your teacups rattling away in the back seat!<br />You always took the mike, oh, and all those cowboys you shot!<br />I held your hand! And I&#8217;ll shoot a cowboy one day!</p>
|
||||||
|
</blockquote>
|
||||||
|
<blockquote>
|
||||||
|
<p>You said, &#8220;Let&#8217;s run into the woods like kids!&#8221; <br />You said, &#8220;Let&#8217;s rub our hands together super-hot!&#8221; <br />And we scalded the trees and left octagons, I think that was you and<br />You threw parties on the roof!</p>
|
||||||
|
</blockquote></description></item></channel>
|
||||||
|
</rss>
|
@ -0,0 +1,7 @@
|
|||||||
|
module TestFiles
|
||||||
|
Dir.chdir(File.dirname(__FILE__)) do
|
||||||
|
Dir['files/*.{html,xhtml,xml}'].each do |fname|
|
||||||
|
const_set fname[%r!/(\w+)\.\w+$!, 1].upcase, IO.read(fname)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,65 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
require 'test/unit'
|
||||||
|
require 'hpricot'
|
||||||
|
require 'load_files'
|
||||||
|
|
||||||
|
class TestAlter < Test::Unit::TestCase
|
||||||
|
def setup
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_before
|
||||||
|
test0 = "<link rel='stylesheet' href='test0.css' />"
|
||||||
|
@basic.at("link").before(test0)
|
||||||
|
assert_equal 'test0.css', @basic.at("link").attributes['href']
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_after
|
||||||
|
test_inf = "<link rel='stylesheet' href='test_inf.css' />"
|
||||||
|
@basic.search("link")[-1].after(test_inf)
|
||||||
|
assert_equal 'test_inf.css', @basic.search("link")[-1].attributes['href']
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_wrap
|
||||||
|
ohmy = (@basic/"p.ohmy").wrap("<div id='wrapper'></div>")
|
||||||
|
assert_equal 'wrapper', ohmy[0].parent['id']
|
||||||
|
assert_equal 'ohmy', Hpricot(@basic.to_html).at("#wrapper").children[0]['class']
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_add_class
|
||||||
|
first_p = (@basic/"p:first").add_class("testing123")
|
||||||
|
assert first_p[0].get_attribute("class").split(" ").include?("testing123")
|
||||||
|
assert (Hpricot(@basic.to_html)/"p:first")[0].attributes["class"].split(" ").include?("testing123")
|
||||||
|
assert !(Hpricot(@basic.to_html)/"p:gt(0)")[0].attributes["class"].split(" ").include?("testing123")
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_change_attributes
|
||||||
|
all_ps = (@basic/"p").attr("title", "Some Title")
|
||||||
|
all_as = (@basic/"a").attr("href", "http://my_new_href.com")
|
||||||
|
all_lb = (@basic/"link").attr("href") { |e| e.name }
|
||||||
|
assert_changed(@basic, "p", all_ps) {|p| p.attributes["title"] == "Some Title"}
|
||||||
|
assert_changed(@basic, "a", all_as) {|a| a.attributes["href"] == "http://my_new_href.com"}
|
||||||
|
assert_changed(@basic, "link", all_lb) {|a| a.attributes["href"] == "link" }
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_remove_attr
|
||||||
|
all_rl = (@basic/"link").remove_attr("href")
|
||||||
|
assert_changed(@basic, "link", all_rl) { |link| link['href'].nil? }
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_remove_class
|
||||||
|
all_c1 = (@basic/"p[@class*='last']").remove_class("last")
|
||||||
|
assert_changed(@basic, "p[@class*='last']", all_c1) { |p| p['class'] == 'final' }
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_remove_all_classes
|
||||||
|
all_c2 = (@basic/"p[@class]").remove_class
|
||||||
|
assert_changed(@basic, "p[@class]", all_c2) { |p| p['class'].nil? }
|
||||||
|
end
|
||||||
|
|
||||||
|
def assert_changed original, selector, set, &block
|
||||||
|
assert set.all?(&block)
|
||||||
|
assert Hpricot(original.to_html).search(selector).all?(&block)
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,24 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
require 'test/unit'
|
||||||
|
require 'hpricot'
|
||||||
|
|
||||||
|
class TestBuilder < Test::Unit::TestCase
|
||||||
|
def test_escaping_text
|
||||||
|
doc = Hpricot() { b "<a\"b>" }
|
||||||
|
assert_equal "<b><a"b></b>", doc.to_html
|
||||||
|
assert_equal %{<a"b>}, doc.at("text()").to_s
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_no_escaping_text
|
||||||
|
doc = Hpricot() { div.test.me! { text "<a\"b>" } }
|
||||||
|
assert_equal %{<div class="test" id="me"><a"b></div>}, doc.to_html
|
||||||
|
assert_equal %{<a"b>}, doc.at("text()").to_s
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_latin1_entities
|
||||||
|
doc = Hpricot() { b "\200\225" }
|
||||||
|
assert_equal "<b>ۥ</b>", doc.to_html
|
||||||
|
assert_equal "\342\202\254\342\200\242", doc.at("text()").to_s
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,379 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
require 'test/unit'
|
||||||
|
require 'hpricot'
|
||||||
|
require 'load_files'
|
||||||
|
|
||||||
|
class TestParser < Test::Unit::TestCase
|
||||||
|
def test_set_attr
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
@basic.search('//p').set('class', 'para')
|
||||||
|
assert_equal 4, @basic.search('//p').length
|
||||||
|
assert_equal 4, @basic.search('//p').find_all { |x| x['class'] == 'para' }.length
|
||||||
|
end
|
||||||
|
|
||||||
|
# Test creating a new element
|
||||||
|
def test_new_element
|
||||||
|
elem = Hpricot::Elem.new(Hpricot::STag.new('form'))
|
||||||
|
assert_not_nil(elem)
|
||||||
|
assert_not_nil(elem.attributes)
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_scan_text
|
||||||
|
assert_equal 'FOO', Hpricot.make("FOO").first.content
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_filter_by_attr
|
||||||
|
@boingboing = Hpricot.parse(TestFiles::BOINGBOING)
|
||||||
|
|
||||||
|
# this link is escaped in the doc
|
||||||
|
link = 'http://www.youtube.com/watch?v=TvSNXyNw26g&search=chris%20ware'
|
||||||
|
assert_equal link, @boingboing.at("a[@href='#{link}']")['href']
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_filter_contains
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
assert_equal '<title>Sample XHTML</title>', @basic.search("title:contains('Sample')").to_s
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_get_element_by_id
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
assert_equal 'link1', @basic.get_element_by_id('link1')['id']
|
||||||
|
assert_equal 'link1', @basic.get_element_by_id('body1').get_element_by_id('link1').get_attribute('id')
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_get_element_by_tag_name
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
assert_equal 'link1', @basic.get_elements_by_tag_name('a')[0].get_attribute('id')
|
||||||
|
assert_equal 'link1', @basic.get_elements_by_tag_name('body')[0].get_element_by_id('link1').get_attribute('id')
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_output_basic
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
@basic2 = Hpricot.parse(@basic.inner_html)
|
||||||
|
scan_basic @basic2
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_scan_basic
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
scan_basic @basic
|
||||||
|
end
|
||||||
|
|
||||||
|
def scan_basic doc
|
||||||
|
assert_kind_of Hpricot::XMLDecl, doc.children.first
|
||||||
|
assert_not_equal doc.children.first.to_s, doc.children[1].to_s
|
||||||
|
assert_equal 'link1', doc.at('#link1')['id']
|
||||||
|
assert_equal 'link1', doc.at("p a")['id']
|
||||||
|
assert_equal 'link1', (doc/:p/:a).first['id']
|
||||||
|
assert_equal 'link1', doc.search('p').at('a').get_attribute('id')
|
||||||
|
assert_equal 'link2', (doc/'p').filter('.ohmy').search('a').first.get_attribute('id')
|
||||||
|
assert_equal (doc/'p')[2], (doc/'p').filter(':nth(2)')[0]
|
||||||
|
assert_equal (doc/'p')[2], (doc/'p').filter('[3]')[0]
|
||||||
|
assert_equal 4, (doc/'p').filter('*').length
|
||||||
|
assert_equal 4, (doc/'p').filter('* *').length
|
||||||
|
eles = (doc/'p').filter('.ohmy')
|
||||||
|
assert_equal 1, eles.length
|
||||||
|
assert_equal 'ohmy', eles.first.get_attribute('class')
|
||||||
|
assert_equal 3, (doc/'p:not(.ohmy)').length
|
||||||
|
assert_equal 3, (doc/'p').not('.ohmy').length
|
||||||
|
assert_equal 3, (doc/'p').not(eles.first).length
|
||||||
|
assert_equal 2, (doc/'p').filter('[@class]').length
|
||||||
|
assert_equal 'last final', (doc/'p[@class~="final"]').first.get_attribute('class')
|
||||||
|
assert_equal 1, (doc/'p').filter('[@class~="final"]').length
|
||||||
|
assert_equal 2, (doc/'p > a').length
|
||||||
|
assert_equal 1, (doc/'p.ohmy > a').length
|
||||||
|
assert_equal 2, (doc/'p / a').length
|
||||||
|
assert_equal 2, (doc/'link ~ link').length
|
||||||
|
assert_equal 3, (doc/'title ~ link').length
|
||||||
|
assert_equal 5, (doc/"//p/text()").length
|
||||||
|
assert_equal 6, (doc/"//p[a]//text()").length
|
||||||
|
assert_equal 2, (doc/"//p/a/text()").length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_positional
|
||||||
|
h = Hpricot( "<div><br/><p>one</p><p>two</p></div>" )
|
||||||
|
assert_equal "<p>one</p>", h.search("//div/p:eq(0)").to_s
|
||||||
|
assert_equal "<p>one</p>", h.search("//div/p:first").to_s
|
||||||
|
assert_equal "<p>one</p>", h.search("//div/p:first()").to_s
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_pace
|
||||||
|
doc = Hpricot(TestFiles::PACE_APPLICATION)
|
||||||
|
assert_equal 'get', doc.at('form[@name=frmSect11]')['method']
|
||||||
|
# assert_equal '2', doc.at('#hdnSpouse')['value']
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_scan_boingboing
|
||||||
|
@boingboing = Hpricot.parse(TestFiles::BOINGBOING)
|
||||||
|
assert_equal 60, (@boingboing/'p.posted').length
|
||||||
|
assert_equal 1, @boingboing.search("//a[@name='027906']").length
|
||||||
|
assert_equal 10, @boingboing.search("script comment()").length
|
||||||
|
assert_equal 3, @boingboing.search("a[text()*='Boing']").length
|
||||||
|
assert_equal 1, @boingboing.search("h3[text()='College kids reportedly taking more smart drugs']").length
|
||||||
|
assert_equal 0, @boingboing.search("h3[text()='College']").length
|
||||||
|
assert_equal 60, @boingboing.search("h3").length
|
||||||
|
assert_equal 59, @boingboing.search("h3[text()!='College kids reportedly taking more smart drugs']").length
|
||||||
|
assert_equal 17, @boingboing.search("h3[text()$='s']").length
|
||||||
|
assert_equal 129, @boingboing.search("p[text()]").length
|
||||||
|
assert_equal 211, @boingboing.search("p").length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_reparent
|
||||||
|
doc = Hpricot(%{<div id="blurb_1"></div>})
|
||||||
|
div1 = doc.search('#blurb_1')
|
||||||
|
div1.before('<div id="blurb_0"></div>')
|
||||||
|
|
||||||
|
div0 = doc.search('#blurb_0')
|
||||||
|
div0.before('<div id="blurb_a"></div>')
|
||||||
|
|
||||||
|
assert_equal 'div', doc.at('#blurb_1').name
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_siblings
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
t = @basic.at(:title)
|
||||||
|
e = t.next_sibling
|
||||||
|
assert_equal 'test1.css', e['href']
|
||||||
|
assert_equal 'title', e.previous_sibling.name
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_css_negation
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
assert_equal 3, (@basic/'p:not(.final)').length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_remove_attribute
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
(@basic/:p).each { |ele| ele.remove_attribute('class') }
|
||||||
|
assert_equal 0, (@basic/'p[@class]').length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_abs_xpath
|
||||||
|
@boingboing = Hpricot.parse(TestFiles::BOINGBOING)
|
||||||
|
assert_equal 60, @boingboing.search("/html/body//p[@class='posted']").length
|
||||||
|
assert_equal 60, @boingboing.search("/*/body//p[@class='posted']").length
|
||||||
|
assert_equal 18, @boingboing.search("//script").length
|
||||||
|
divs = @boingboing.search("//script/../div")
|
||||||
|
assert_equal 1, divs.length
|
||||||
|
imgs = @boingboing.search('//div/p/a/img')
|
||||||
|
assert_equal 15, imgs.length
|
||||||
|
assert_equal 17, @boingboing.search('//div').search('p/a/img').length
|
||||||
|
assert imgs.all? { |x| x.name == 'img' }
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_predicates
|
||||||
|
@boingboing = Hpricot.parse(TestFiles::BOINGBOING)
|
||||||
|
assert_equal 2, @boingboing.search('//link[@rel="alternate"]').length
|
||||||
|
p_imgs = @boingboing.search('//div/p[/a/img]')
|
||||||
|
assert_equal 15, p_imgs.length
|
||||||
|
assert p_imgs.all? { |x| x.name == 'p' }
|
||||||
|
p_imgs = @boingboing.search('//div/p[a/img]')
|
||||||
|
assert_equal 18, p_imgs.length
|
||||||
|
assert p_imgs.all? { |x| x.name == 'p' }
|
||||||
|
assert_equal 1, @boingboing.search('//input[@checked]').length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_tag_case
|
||||||
|
@tenderlove = Hpricot.parse(TestFiles::TENDERLOVE)
|
||||||
|
assert_equal 2, @tenderlove.search('//a').length
|
||||||
|
assert_equal 3, @tenderlove.search('//area').length
|
||||||
|
assert_equal 2, @tenderlove.search('//meta').length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_alt_predicates
|
||||||
|
@boingboing = Hpricot.parse(TestFiles::BOINGBOING)
|
||||||
|
assert_equal 1, @boingboing.search('//table/tr:last').length
|
||||||
|
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
assert_equal "<p>The third paragraph</p>",
|
||||||
|
@basic.search('p:eq(2)').to_html
|
||||||
|
assert_equal '<p class="last final"><b>THE FINAL PARAGRAPH</b></p>',
|
||||||
|
@basic.search('p:last').to_html
|
||||||
|
assert_equal 'last final', @basic.search('//p:last-of-type').first.get_attribute('class')
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_insert_after # ticket #63
|
||||||
|
doc = Hpricot('<html><body><div id="a-div"></div></body></html>')
|
||||||
|
(doc/'div').each do |element|
|
||||||
|
element.after('<p>Paragraph 1</p><p>Paragraph 2</p>')
|
||||||
|
end
|
||||||
|
assert_equal doc.to_html, '<html><body><div id="a-div"></div><p>Paragraph 1</p><p>Paragraph 2</p></body></html>'
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_insert_before # ticket #61
|
||||||
|
doc = Hpricot('<html><body><div id="a-div"></div></body></html>')
|
||||||
|
(doc/'div').each do |element|
|
||||||
|
element.before('<p>Paragraph 1</p><p>Paragraph 2</p>')
|
||||||
|
end
|
||||||
|
assert_equal doc.to_html, '<html><body><p>Paragraph 1</p><p>Paragraph 2</p><div id="a-div"></div></body></html>'
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_many_paths
|
||||||
|
@boingboing = Hpricot.parse(TestFiles::BOINGBOING)
|
||||||
|
assert_equal 62, @boingboing.search('p.posted, link[@rel="alternate"]').length
|
||||||
|
assert_equal 20, @boingboing.search('//div/p[a/img]|//link[@rel="alternate"]').length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_stacked_search
|
||||||
|
@boingboing = Hpricot.parse(TestFiles::BOINGBOING)
|
||||||
|
assert_kind_of Hpricot::Elements, @boingboing.search('//div/p').search('a img')
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_class_search
|
||||||
|
# test case sent by Chih-Chao Lam
|
||||||
|
doc = Hpricot("<div class=xyz'>abc</div>")
|
||||||
|
assert_equal 1, doc.search(".xyz").length
|
||||||
|
doc = Hpricot("<div class=xyz>abc</div><div class=abc>xyz</div>")
|
||||||
|
assert_equal 1, doc.search(".xyz").length
|
||||||
|
assert_equal 4, doc.search("*").length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_kleene_star
|
||||||
|
# bug noticed by raja bhatia
|
||||||
|
doc = Hpricot("<span class='small'>1</span><div class='large'>2</div><div class='small'>3</div><span class='blue large'>4</span>")
|
||||||
|
assert_equal 2, doc.search("*[@class*='small']").length
|
||||||
|
assert_equal 2, doc.search("*.small").length
|
||||||
|
assert_equal 2, doc.search(".small").length
|
||||||
|
assert_equal 2, doc.search(".large").length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_empty_comment
|
||||||
|
doc = Hpricot("<p><!----></p>")
|
||||||
|
assert doc.children[0].children[0].comment?
|
||||||
|
doc = Hpricot("<p><!-- --></p>")
|
||||||
|
assert doc.children[0].children[0].comment?
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_body_newlines
|
||||||
|
@immob = Hpricot.parse(TestFiles::IMMOB)
|
||||||
|
body = @immob.at(:body)
|
||||||
|
{'background' => '', 'bgcolor' => '#ffffff', 'text' => '#000000', 'marginheight' => '10',
|
||||||
|
'marginwidth' => '10', 'leftmargin' => '10', 'topmargin' => '10', 'link' => '#000066',
|
||||||
|
'alink' => '#ff6600', 'hlink' => "#ff6600", 'vlink' => "#000000"}.each do |k, v|
|
||||||
|
assert_equal v, body[k]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_nested_twins
|
||||||
|
@doc = Hpricot("<div>Hi<div>there</div></div>")
|
||||||
|
assert_equal 1, (@doc/"div div").length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_wildcard
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
assert_equal 3, (@basic/"*[@id]").length
|
||||||
|
assert_equal 3, (@basic/"//*[@id]").length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_javascripts
|
||||||
|
@immob = Hpricot.parse(TestFiles::IMMOB)
|
||||||
|
assert_equal 3, (@immob/:script)[0].inner_html.scan(/<LINK/).length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_nested_scripts
|
||||||
|
@week9 = Hpricot.parse(TestFiles::WEEK9)
|
||||||
|
assert_equal 14, (@week9/"a").find_all { |x| x.inner_html.include? "GameCenter" }.length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_uswebgen
|
||||||
|
@uswebgen = Hpricot.parse(TestFiles::USWEBGEN)
|
||||||
|
# sent by brent beardsley, hpricot 0.3 had problems with all the links.
|
||||||
|
assert_equal 67, (@uswebgen/:a).length
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_mangled_tags
|
||||||
|
[%{<html><form name='loginForm' method='post' action='/units/a/login/1,13088,779-1,00.html'?URL=></form></html>},
|
||||||
|
%{<html><form name='loginForm' ?URL= method='post' action='/units/a/login/1,13088,779-1,00.html'></form></html>},
|
||||||
|
%{<html><form name='loginForm'?URL= ?URL= method='post' action='/units/a/login/1,13088,779-1,00.html'?URL=></form></html>},
|
||||||
|
%{<html><form name='loginForm' method='post' action='/units/a/login/1,13088,779-1,00.html' ?URL=></form></html>}].
|
||||||
|
each do |str|
|
||||||
|
doc = Hpricot(str)
|
||||||
|
assert_equal 1, (doc/:form).length
|
||||||
|
assert_equal '/units/a/login/1,13088,779-1,00.html', doc.at("form")['action']
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_procins
|
||||||
|
doc = Hpricot("<?php print('hello') ?>\n<?xml blah='blah'?>")
|
||||||
|
assert_equal "php", doc.children[0].target
|
||||||
|
assert_equal "blah='blah'", doc.children[2].content
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_buffer_error
|
||||||
|
assert_raise Hpricot::ParseError, "ran out of buffer space on element <input>, starting on line 3." do
|
||||||
|
Hpricot(%{<p>\n\n<input type="hidden" name="__VIEWSTATE" value="#{(("X" * 2000) + "\n") * 22}" />\n\n</p>})
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_youtube_attr
|
||||||
|
str = <<-edoc
|
||||||
|
<html><body>
|
||||||
|
Lorem ipsum. Jolly roger, ding-dong sing-a-long
|
||||||
|
<object width="425" height="350">
|
||||||
|
<param name="movie" value="http://www.youtube.com/v/NbDQ4M_cuwA"></param>
|
||||||
|
<param name="wmode" value="transparent"></param>
|
||||||
|
<embed src="http://www.youtube.com/v/NbDQ4M_cuwA"
|
||||||
|
type="application/x-shockwave-flash" wmode="transparent" width="425" height="350">
|
||||||
|
</embed>
|
||||||
|
</object>
|
||||||
|
Check out my posting, I have bright mice in large clown cars.
|
||||||
|
<object width="425" height="350">
|
||||||
|
<param name="movie" value="http://www.youtube.com/v/foobar"></param>
|
||||||
|
<param name="wmode" value="transparent"></param>
|
||||||
|
<embed src="http://www.youtube.com/v/foobar"
|
||||||
|
type="application/x-shockwave-flash" wmode="transparent" width="425" height="350">
|
||||||
|
</embed>
|
||||||
|
</object>
|
||||||
|
</body></html?
|
||||||
|
edoc
|
||||||
|
doc = Hpricot(str)
|
||||||
|
assert_equal "http://www.youtube.com/v/NbDQ4M_cuwA",
|
||||||
|
doc.at("//object/param[@value='http://www.youtube.com/v/NbDQ4M_cuwA']")['value']
|
||||||
|
end
|
||||||
|
|
||||||
|
# ticket #84 by jamezilla
|
||||||
|
def test_screwed_xmlns
|
||||||
|
doc = Hpricot(<<-edoc)
|
||||||
|
<?xml:namespace prefix = cwi />
|
||||||
|
<html><body>HAI</body></html>
|
||||||
|
edoc
|
||||||
|
assert_equal "HAI", doc.at("body").inner_text
|
||||||
|
end
|
||||||
|
|
||||||
|
# Reported by Jonathan Nichols on the Hpricot list (24 May 2007)
|
||||||
|
def test_self_closed_form
|
||||||
|
doc = Hpricot(<<-edoc)
|
||||||
|
<body>
|
||||||
|
<form action="/loginRegForm" name="regForm" method="POST" />
|
||||||
|
<input type="button">
|
||||||
|
</form>
|
||||||
|
</body>
|
||||||
|
edoc
|
||||||
|
assert_equal "button", doc.at("//form/input")['type']
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_filters
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
assert_equal 0, (@basic/"title:parent").size
|
||||||
|
assert_equal 3, (@basic/"p:parent").size
|
||||||
|
assert_equal 1, (@basic/"title:empty").size
|
||||||
|
assert_equal 1, (@basic/"p:empty").size
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_keep_cdata
|
||||||
|
str = %{<script> /*<![CDATA[*/
|
||||||
|
/*]]>*/ </script>}
|
||||||
|
assert_equal str, Hpricot(str).to_html
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_namespace
|
||||||
|
chunk = <<-END
|
||||||
|
<a xmlns:t="http://www.nexopia.com/dev/template">
|
||||||
|
<t:sam>hi </t:sam>
|
||||||
|
</a>
|
||||||
|
END
|
||||||
|
doc = Hpricot::XML(chunk)
|
||||||
|
assert (doc/"//t:sam").size > 0 # at least this should probably work
|
||||||
|
# assert (doc/"//sam").size > 0 # this would be nice
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,16 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
require 'test/unit'
|
||||||
|
require 'hpricot'
|
||||||
|
require 'load_files'
|
||||||
|
|
||||||
|
class TestParser < Test::Unit::TestCase
|
||||||
|
def test_roundtrip
|
||||||
|
@basic = Hpricot.parse(TestFiles::BASIC)
|
||||||
|
%w[link link[2] body #link1 a p.ohmy].each do |css_sel|
|
||||||
|
ele = @basic.at(css_sel)
|
||||||
|
assert_equal ele, @basic.at(ele.css_path)
|
||||||
|
assert_equal ele, @basic.at(ele.xpath)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,66 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
require 'test/unit'
|
||||||
|
require 'hpricot'
|
||||||
|
require 'load_files'
|
||||||
|
|
||||||
|
class TestPreserved < Test::Unit::TestCase
|
||||||
|
def assert_roundtrip str
|
||||||
|
doc = Hpricot(str)
|
||||||
|
yield doc if block_given?
|
||||||
|
str2 = doc.to_original_html
|
||||||
|
[*str].zip([*str2]).each do |s1, s2|
|
||||||
|
assert_equal s1, s2
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def assert_html str1, str2
|
||||||
|
doc = Hpricot(str2)
|
||||||
|
yield doc if block_given?
|
||||||
|
assert_equal str1, doc.to_original_html
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_simple
|
||||||
|
str = "<p>Hpricot is a <b>you know <i>uh</b> fine thing.</p>"
|
||||||
|
assert_html str, str
|
||||||
|
assert_html "<p class=\"new\">Hpricot is a <b>you know <i>uh</b> fine thing.</p>", str do |doc|
|
||||||
|
(doc/:p).set('class', 'new')
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_parent
|
||||||
|
str = "<html><base href='/'><head><title>Test</title></head><body><div id='wrap'><p>Paragraph one.</p><p>Paragraph two.</p></div></body></html>"
|
||||||
|
assert_html str, str
|
||||||
|
assert_html "<html><base href='/'><body><div id=\"all\"><div><p>Paragraph one.</p></div><div><p>Paragraph two.</p></div></div></body></html>", str do |doc|
|
||||||
|
(doc/:head).remove
|
||||||
|
(doc/:div).set('id', 'all')
|
||||||
|
(doc/:p).wrap('<div></div>')
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_escaping_of_contents
|
||||||
|
doc = Hpricot(TestFiles::BOINGBOING)
|
||||||
|
assert_equal "Fukuda\342\200\231s Automatic Door opens around your body as you pass through it. The idea is to save energy and keep the room clean.", doc.at("img[@alt='200606131240']").next.to_s.strip
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_files
|
||||||
|
assert_roundtrip TestFiles::BASIC
|
||||||
|
assert_roundtrip TestFiles::BOINGBOING
|
||||||
|
assert_roundtrip TestFiles::CY0
|
||||||
|
end
|
||||||
|
|
||||||
|
def test_escaping_of_attrs
|
||||||
|
# ampersands in URLs
|
||||||
|
str = %{<a href="http://google.com/search?q=hpricot&l=en">Google</a>}
|
||||||
|
link = (doc = Hpricot(str)).at(:a)
|
||||||
|
assert_equal "http://google.com/search?q=hpricot&l=en", link['href']
|
||||||
|
assert_equal "http://google.com/search?q=hpricot&l=en", link.attributes['href']
|
||||||
|
assert_equal "http://google.com/search?q=hpricot&l=en", link.get_attribute('href')
|
||||||
|
assert_equal "http://google.com/search?q=hpricot&l=en", link.raw_attributes['href']
|
||||||
|
assert_equal str, doc.to_html
|
||||||
|
|
||||||
|
# alter the url
|
||||||
|
link['href'] = "javascript:alert(\"AGGA-KA-BOO!\")"
|
||||||
|
assert_equal %{<a href="javascript:alert("AGGA-KA-BOO!")">Google</a>}, doc.to_html
|
||||||
|
end
|
||||||
|
end
|
@ -0,0 +1,28 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
require 'test/unit'
|
||||||
|
require 'hpricot'
|
||||||
|
require 'load_files'
|
||||||
|
|
||||||
|
class TestParser < Test::Unit::TestCase
|
||||||
|
# normally, the link tags are empty HTML tags.
|
||||||
|
# contributed by laudney.
|
||||||
|
def test_normally_empty
|
||||||
|
doc = Hpricot::XML("<rss><channel><title>this is title</title><link>http://fake.com</link></channel></rss>")
|
||||||
|
assert_equal "this is title", (doc/:rss/:channel/:title).text
|
||||||
|
assert_equal "http://fake.com", (doc/:rss/:channel/:link).text
|
||||||
|
end
|
||||||
|
|
||||||
|
# make sure XML doesn't get downcased
|
||||||
|
def test_casing
|
||||||
|
doc = Hpricot::XML(TestFiles::WHY)
|
||||||
|
assert_equal "hourly", (doc.at "sy:updatePeriod").inner_html
|
||||||
|
assert_equal 1, (doc/"guid[@isPermaLink]").length
|
||||||
|
end
|
||||||
|
|
||||||
|
# be sure tags named "text" are ok
|
||||||
|
def test_text_tags
|
||||||
|
doc = Hpricot::XML("<feed><title>City Poisoned</title><text>Rita Lee has poisoned Brazil.</text></feed>")
|
||||||
|
assert_equal "City Poisoned", (doc/"title").text
|
||||||
|
end
|
||||||
|
end
|
Reference in new issue