如何知道网站是否被搜索引擎的蜘蛛爬过

IsaacZ 发表于 2009-5-2 01:47:42

1. 看网站日志
如果你使用的是虚拟主机的话，主机管理面板里面一般都有启用IIS 日志的选项，在里面打开IIS 日志。一般IIS 日志存放在你的ftp 账户的根目录下并且以 log××× 等字样命名的文件夹下面。IIS 日志为.log 尾缀的文本文件，下载下来，然后查找带有 Baiduspider+ 字样的记录行，该行即为百度蜘蛛爬行的痕迹。另外还有 google的是 googlebot 等等，自己可以研究一下

如果是自己的独立服务器，打开IIS，在站点标签下面有个 “启用日志记录” 选中，然后点属性指定日志保存路径即可

2、ASP可以使用global.asa来纪录访问者的HTTP_USER_AGENT

3、PHP和asp都可以在具体页面中通过放置代码将HTTP_USER_AGENT保存起来的方法看蜘蛛扫描信息。

ASP示例代码:

以下是robots.asp代码：
<%
Sub robot()
      Dim robots:robots="Baiduspider+@Baidu|Googlebot@Google|ia_archiver@Alexa|IAArchiver@Alexa|ASPSeek@ASPSeek|YahooSeeker@Yahoo|SogouBot@sogou|help.yahoo.com/help/us/ysearch/slurp@Yahoo|sohu-search@SOHU|MSNBOT@MSN"
      dim I1,I2,l1,l2,l3,i,rs
      l2=false
      l1=request.servervariables("http_user_agent")
      F1=request.ServerVariables("SCRIPT_NAME")
      I1=split(robots,chr(124))
      for i=0 to ubound(I1)
            I2=split(I1(i),"@")
            if instr(lcase(l1),lcase(I2(0)))>0 then
                     l2=true:l3=I2(1):exit for
            end if
      next
      if l2 and len(l3)>0 then’如果是爬虫,就更新爬虫信息
            FilePath = Server.Mappath("robots/"&l3&"_robots.txt")
            ’记录蜘蛛爬行
            Set Fso = Server.CreateObject("Scripting.FileSystemObject")
            Set Fout = Fso.OpenTextFile(FilePath,8,True)
                              Fout.WriteLine "索引页面："&F1
                              Fout.WriteLine "蜘蛛："&l3&chr(32)&chr(32)&"更新时间："&Now()
                              Fout.WriteLine "-----------------------------------------------"
                              Fout.Close
            Set Fout = Nothing
            Set Fso = Nothing
      end if
end Sub
%>

先在你的站里建立一个robots文件夹，把robots.asp放到robots文件夹下，再把调用函数放在公用的函数文件里面。因为一般网站都会用到数据库，所以只要Call robot()写进去就行了。
下面给一个参考，是我的小站的数据库文件：


<%
Set Conn=Server.CreateObject("ADODB.Connection")
Connstr="DBQ="+server.mappath("data/gata.mdb")+";DefaultDir=;DRIVER={Microsoft Access Driver (*.mdb)}"
Conn.Open connstr
Call robot()
%>
在我新做的站点分析昨天百度搜索蜘蛛爬过的代码：
2008-04-13 02:07:13 W3SVC314147887 125.32.112.38 GET /index.html - 80 - 220.181.38.174 Baiduspider+(+http://www.baidu.com/search/spider.htm) 304 0 0
这句话的含义是：2008年4月13日，百度蜘蛛来访，页面未更改。

页: [1]

点拨论坛，菜鸟家园's Archiver

如何知道网站是否被搜索引擎的蜘蛛爬过