网上找来的~~~,准确说是网上攒起来的,综合了http://www.schurpf.com/google-pagerank-python/ 和 http://www.cnpythoner.com/post/190.html 这两位大大的成果,很明显这个程序还是有很大改进的空间,可惜我力止于此啊,好歹也是第一次玩Python,花了好几个礼拜搞成这样不易啊。
import re,urllib,httplib,time prhost='toolbarqueries.google.com' prpath='/tbr?client=navclient-auto&ch=%s&features=Rank&q=info:%s' def get_url(url): host_re = re.compile(r'^https?://(.*?)($|/)', re.IGNORECASE ) return host_re.search(url).group(0)[7:-1] def GetHash (url): SEED = "Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer." Result = 0x01020345 for i in range(len(url)) : Result ^= ord(SEED[i%len(SEED)]) ^ ord(url[i]) Result = Result >> 23 | Result << 9 Result &= 0xffffffff return '8%x' % Result def GetPageRank (url): keyinfo = GetHash (url) opener = urllib.FancyURLopener() hosturl = "http://toolbarqueries.google.com/tbr?client=navclient-auto&ch=%s&features=Rank&q=info:%s" % (keyinfo,url) info = opener.open(hosturl).read() cinfo = info.decode('utf-8').encode('gbk') prnum = cinfo[9:10] print prnum return prnum f = file('D:\pr7.txt','w') for m in file('D:\info7.txt','r'): murl = m.strip() # checkurl = get_url(murl) try: prnum = GetPageRank(murl) except Exception,e: prnum = -1 content = "%s,%s\n" % (murl,prnum) f.write(content) continue else: content = "%s,%s\n" % (murl,prnum) f.write(content) time.sleep(5) f.close()
这段代码有意思的是中间的一段语句Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer,
翻译成中文是“采集PR值违反谷歌的用户协议,没错,说得就是你这个贱人!”好吧,据我分析这段代码是为了获取一个密钥,然后拼到URL里查询page rank,只是不解
为何用这段话....