Useful scripts with mechanize, hpricot, and htmldoc

Here are some useful scripts with mechanize, hpricot, and htmldoc

First, install mechanize and hpricot and htmldoc (pdfs)

gem install hpricot
gem install mechanize
gem install htmldoc

Save web page as pdf with mechanize and htmldoc

require 'rubygems'
require 'mechanize'
require 'htmldoc'

agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac FireFox'
agent.redirect_ok = true

page = agent.get('http://scottmotte.com/archive')
pdf = PDF::HTMLDoc.new

pdf.set_option :outfile, "~/Desktop/outfile.pdf"
pdf.set_option :bodycolor, :white
pdf.set_option :links, true

pdf << page.body

if pdf.generate
  puts "Hallelujah"
else
  puts 'No Joy'
end

Log into a website with mechanize, save some data with hpricot, and then insert that data into another website

For this one, I wanted to be able to automatically save my stock values each day, and insert them into my personal application where I graph them on a daily basis. I was tired of doing this by hand so I used mechanize and hpricot to automate the process. (I just need to set this up on schedule as a cron task now.)

require 'rubygems'
require 'mechanize'
require 'hpricot'
require 'open-uri'

agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac FireFox'
agent.redirect_ok = true

## FIRST TIME ##
page = agent.get('http://scottrade.com')

login_form = page.forms.first
login_form.account = '*****your account number*****'
login_form.password = '*****your password*****'
page = agent.submit(login_form)

link = page.links.text("My Account")
page = link.click

doc = Hpricot.parse(page.body)

# positions
positions = (doc/'span#DetailedBalances_lblCashStocksMutFdsCDBonds').inner_html
positions = positions.gsub( "$", '') #strip the dollar sign
positions = positions.gsub( ",", '') #strip the commas

# cash
cash = (doc/'span#DetailedBalances_lblCashTotalMoneyBalance').inner_html
cash = cash.gsub( "$", '')
cash = cash.gsub( ",", '')

# total
total = (doc/'span#DetailedBalances_lblCashTotalAcctValue').inner_html
total = total.gsub("$", '')
total = total.gsub(",", '')

page = agent.get('/Default.aspx?log=off') #log out

# now insert that content
page = agent.get('http://app.scottmotte.com')

login_form = page.forms.first
login_form.login = '***my username***'
login_form.password = '***my password***'
page = agent.submit(login_form)

page = agent.get('/daily_stock_values/new')

f = page.forms.first
f.set_fields( 'fund_value[positions]' => positions)
f.set_fields( 'fund_value[cash]' => cash)
f.set_fields( 'fund_value[total]' => total)
# the date field gets inserted automatically using rails created_at method. The app is built in rails.
page = agent.submit(f)

page = agent.get('/logout')

Get an image from a website and save it locally

require 'rubygems'
require 'mechanize'
require 'hpricot'
require 'open-uri'
require 'uri'

@agent = WWW::Mechanize.new
@agent.user_agent_alias = 'Mac Safari'
@agent.redirect_ok = true
page = @agent.get("http://intype.info/home/trailers/alpha_3/bg.png")
myStr = page.body
aFile = File.new("mypicture.gif", "wb")
aFile.write(myStr)
aFile.close

Comments

  1. Mike | October 10, 2008

    Hi Scott,

    Thank you for the examples.

    When trying to run the first example I get the error:

    /opt/local/lib/ruby/gems/1.8/gems/htmldoc-0.2.1/lib/htmldoc.rb:182:in `execute’: Invalid program path: htmldoc (PDF::HTMLDocException)
    from /opt/local/lib/ruby/gems/1.8/gems/htmldoc-0.2.1/lib/htmldoc.rb:154:in `generate’
    from pdf.rb:18

    I’m using:
    htmldoc (0.2.1)
    Rails 2.1.1
    ruby 1.8.7

    on a mac osx 10.5.4.

    I could not google that error. It actually occurs with every htmldoc script I run.

    Do you have any advice?

    Thanks!
    Mike

  2. Mike | October 10, 2008

    I fixed the problem by setting the program path:

    PDF::HTMLDoc.program_path=”/opt/local/lib/ruby/gems/1.8/gems/htmldoc-0.2.1/lib/htmldoc”

    Now the program runs, although results with no joy.

    page.body does have content, however no pdf is created.

    Insight appreciated, thanks!

  3. Mike | October 10, 2008

    Hi again, ehm…ignore the message above. In order to generate pdf documents I had to build from source.

  4. Peter | November 12, 2008

    Thank you Scott, your examples have been a great help.

    Peter