Useful scripts with mechanize, hpricot, and htmldoc
Here are some useful scripts with mechanize, hpricot, and htmldoc
First, install mechanize and hpricot and htmldoc (pdfs)
gem install hpricot gem install mechanize gem install htmldoc
Save web page as pdf with mechanize and htmldoc
require 'rubygems'
require 'mechanize'
require 'htmldoc'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac FireFox'
agent.redirect_ok = true
page = agent.get('http://scottmotte.com/archive')
pdf = PDF::HTMLDoc.new
pdf.set_option :outfile, "~/Desktop/outfile.pdf"
pdf.set_option :bodycolor, :white
pdf.set_option :links, true
pdf << page.body
if pdf.generate
puts "Hallelujah"
else
puts 'No Joy'
end
Log into a website with mechanize, save some data with hpricot, and then insert that data into another website
For this one, I wanted to be able to automatically save my stock values each day, and insert them into my personal application where I graph them on a daily basis. I was tired of doing this by hand so I used mechanize and hpricot to automate the process. (I just need to set this up on schedule as a cron task now.)
require 'rubygems'
require 'mechanize'
require 'hpricot'
require 'open-uri'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac FireFox'
agent.redirect_ok = true
## FIRST TIME ##
page = agent.get('http://scottrade.com')
login_form = page.forms.first
login_form.account = '*****your account number*****'
login_form.password = '*****your password*****'
page = agent.submit(login_form)
link = page.links.text("My Account")
page = link.click
doc = Hpricot.parse(page.body)
# positions
positions = (doc/'span#DetailedBalances_lblCashStocksMutFdsCDBonds').inner_html
positions = positions.gsub( "$", '') #strip the dollar sign
positions = positions.gsub( ",", '') #strip the commas
# cash
cash = (doc/'span#DetailedBalances_lblCashTotalMoneyBalance').inner_html
cash = cash.gsub( "$", '')
cash = cash.gsub( ",", '')
# total
total = (doc/'span#DetailedBalances_lblCashTotalAcctValue').inner_html
total = total.gsub("$", '')
total = total.gsub(",", '')
page = agent.get('/Default.aspx?log=off') #log out
# now insert that content
page = agent.get('http://app.scottmotte.com')
login_form = page.forms.first
login_form.login = '***my username***'
login_form.password = '***my password***'
page = agent.submit(login_form)
page = agent.get('/daily_stock_values/new')
f = page.forms.first
f.set_fields( 'fund_value[positions]' => positions)
f.set_fields( 'fund_value[cash]' => cash)
f.set_fields( 'fund_value[total]' => total)
# the date field gets inserted automatically using rails created_at method. The app is built in rails.
page = agent.submit(f)
page = agent.get('/logout')
Get an image from a website and save it locally
require 'rubygems'
require 'mechanize'
require 'hpricot'
require 'open-uri'
require 'uri'
@agent = WWW::Mechanize.new
@agent.user_agent_alias = 'Mac Safari'
@agent.redirect_ok = true
page = @agent.get("http://intype.info/home/trailers/alpha_3/bg.png")
myStr = page.body
aFile = File.new("mypicture.gif", "wb")
aFile.write(myStr)
aFile.close

Mike | October 10, 2008
Hi Scott,
Thank you for the examples.
When trying to run the first example I get the error:
/opt/local/lib/ruby/gems/1.8/gems/htmldoc-0.2.1/lib/htmldoc.rb:182:in `execute’: Invalid program path: htmldoc (PDF::HTMLDocException)
from /opt/local/lib/ruby/gems/1.8/gems/htmldoc-0.2.1/lib/htmldoc.rb:154:in `generate’
from pdf.rb:18
I’m using:
htmldoc (0.2.1)
Rails 2.1.1
ruby 1.8.7
on a mac osx 10.5.4.
I could not google that error. It actually occurs with every htmldoc script I run.
Do you have any advice?
Thanks!
Mike
Mike | October 10, 2008
I fixed the problem by setting the program path:
PDF::HTMLDoc.program_path=”/opt/local/lib/ruby/gems/1.8/gems/htmldoc-0.2.1/lib/htmldoc”
Now the program runs, although results with no joy.
page.body does have content, however no pdf is created.
Insight appreciated, thanks!
Mike | October 10, 2008
Hi again, ehm…ignore the message above. In order to generate pdf documents I had to build from source.
Peter | November 12, 2008
Thank you Scott, your examples have been a great help.
Peter