Skip to content
Dicky's Space

Dicky's Space

如何用Python 批量获取Google PR值

May 30, 2013

网上找来的~~~,准确说是网上攒起来的,综合了http://www.schurpf.com/google-pagerank-python/ 和 http://www.cnpythoner.com/post/190.html 这两位大大的成果,很明显这个程序还是有很大改进的空间,可惜我力止于此啊,好歹也是第一次玩Python,花了好几个礼拜搞成这样不易啊。

import re,urllib,httplib,time
prhost='toolbarqueries.google.com'
prpath='/tbr?client=navclient-auto&ch=%s&features=Rank&q=info:%s'

def get_url(url):

    host_re  = re.compile(r'^https?://(.*?)($|/)',
                       re.IGNORECASE
                   )

    return host_re.search(url).group(0)[7:-1]

def GetHash (url):
    SEED = "Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer."
    Result = 0x01020345
    for i in range(len(url)) :
        Result ^= ord(SEED[i%len(SEED)]) ^ ord(url[i])
        Result = Result >> 23 | Result << 9
        Result &= 0xffffffff
    return '8%x' % Result

def GetPageRank (url):
    keyinfo =  GetHash (url)
    opener = urllib.FancyURLopener()
    hosturl = "http://toolbarqueries.google.com/tbr?client=navclient-auto&ch=%s&features=Rank&q=info:%s" % (keyinfo,url)
    info = opener.open(hosturl).read()
    cinfo = info.decode('utf-8').encode('gbk')
    prnum = cinfo[9:10]
    print prnum
    return prnum

f = file('D:\pr7.txt','w')

for m in file('D:\info7.txt','r'):
    murl = m.strip()
#    checkurl = get_url(murl)
    try:
        prnum = GetPageRank(murl)
    except Exception,e:
        prnum = -1
        content = "%s,%s\n" % (murl,prnum)
        f.write(content)
        continue
    else:
        content = "%s,%s\n" % (murl,prnum)
        f.write(content)
        time.sleep(5)

f.close()
这段代码有意思的是中间的一段语句Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer,
翻译成中文是“采集PR值违反谷歌的用户协议,没错,说得就是你这个贱人!”好吧,据我分析这段代码是为了获取一个密钥,然后拼到URL里查询page rank,只是不解
为何用这段话....
Uncategorized

Post navigation

Previous post
Next post
©2025 Dicky's Space | WordPress Theme by SuperbThemes