网上找来的~~~,准确说是网上攒起来的,综合了http://www.schurpf.com/google-pagerank-python/ 和 http://www.cnpythoner.com/post/190.html 这两位大大的成果,很明显这个程序还是有很大改进的空间,可惜我力止于此啊,好歹也是第一次玩Python,花了好几个礼拜搞成这样不易啊。
import re,urllib,httplib,time
prhost='toolbarqueries.google.com'
prpath='/tbr?client=navclient-auto&ch=%s&features=Rank&q=info:%s'
def get_url(url):
host_re = re.compile(r'^https?://(.*?)($|/)',
re.IGNORECASE
)
return host_re.search(url).group(0)[7:-1]
def GetHash (url):
SEED = "Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer."
Result = 0x01020345
for i in range(len(url)) :
Result ^= ord(SEED[i%len(SEED)]) ^ ord(url[i])
Result = Result >> 23 | Result << 9
Result &= 0xffffffff
return '8%x' % Result
def GetPageRank (url):
keyinfo = GetHash (url)
opener = urllib.FancyURLopener()
hosturl = "http://toolbarqueries.google.com/tbr?client=navclient-auto&ch=%s&features=Rank&q=info:%s" % (keyinfo,url)
info = opener.open(hosturl).read()
cinfo = info.decode('utf-8').encode('gbk')
prnum = cinfo[9:10]
print prnum
return prnum
f = file('D:\pr7.txt','w')
for m in file('D:\info7.txt','r'):
murl = m.strip()
# checkurl = get_url(murl)
try:
prnum = GetPageRank(murl)
except Exception,e:
prnum = -1
content = "%s,%s\n" % (murl,prnum)
f.write(content)
continue
else:
content = "%s,%s\n" % (murl,prnum)
f.write(content)
time.sleep(5)
f.close()
这段代码有意思的是中间的一段语句Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer,
翻译成中文是“采集PR值违反谷歌的用户协议,没错,说得就是你这个贱人!”好吧,据我分析这段代码是为了获取一个密钥,然后拼到URL里查询page rank,只是不解
为何用这段话....

Leave a Reply