Tải bản đầy đủ - 0 (trang)
Hack 93. Hide Part of Your Web Site from Yahoo!

Hack 93. Hide Part of Your Web Site from Yahoo!

Tải bản đầy đủ - 0trang

6.3.1.ServerAuthentication

Thebestwaytokeeppagesandfilesoutofviewofthegeneral

publicistoplacethembehindserverauthentication.Server

authenticationisthewebserver'sattempttoverifytheidentity

ofaparticularuserbyrequestingausernameandpassword.

Theauthenticationissetattheserverlevel.

Slurpcan'tenterausernameandpasswordifitencountersa

serverauthenticatedpage,soyoucanbesurethatanything

behindthiswallwillnotbeindexed.Youcansetauthentication

permissionsonadirectoryorfile,andit'sfairlyeasytosetup

withbothApacheandMicrosoft'sInternetInformationServer

(IIS).

Imagineyouhaveadirectoryonyourservercalled/privateand

you'dliketokeepanypagesorfilesoutofYahoo!Search

results.Apacheincludesmanywaystosetauthentication,buta

straightforwardmethodinvolvessettinga.htaccessfile.The

.htaccessfiletellsApachehowtoconfigureaparticular

directory,andyoucanadda.htaccessfiletothe/private

directorywiththefollowinginformation:













AuthName"Pleaseenteryoulogininfo."

AuthTypeBasic

AuthUserFile/your/path/to/.htpasswd

AuthGroupFile/dev/null

requireuserinsertusername



NotethatAuthUserFilepointstoafilethatcontainstheusername

andpasswordoftheauthenticateduser,andyou'llneedto

change/your/path/to/toarealdirectoryonyourserverthat'snot

accessibleviatheWeb.Thenextstepistocreatethatpassword

filewiththehtpasswdtool.Enterthefollowingcommandfroma

commandprompt:







htpasswd-c/your/path/to/.htpasswdinsertusername



Thiscreatestheproper.htpasswdfileforthatuserandputsin

placeallofthepiecesforbasicHTTPauthentication.

TogetthesameresultsonaWindowsserverrunningIIS,open

theIISmanagerandfindthedirectoryyou'dliketoprotect.

Right-clickthedirectoryandchooseProperties Directory

Security.ClickEditunder"AnonymousAccessand

AuthenticationControl,"andyou'llseethewindowshownin

Figure6-2.



Figure6-2.AuthenticationMethodspromptinIIS



Uncheckthe"Anonymousaccess"boxtorequireauthentication.

Check"IntegratedWindowsauthentication"forabitmore

securityor"Basicauthentication"forthemostbasicHTTP

authentication.Onceyousetoneofthese,onlyauthenticated

userswillbeabletoviewthefilesorsubdirectoriesof/private,

andSlurpwon'tbeallowedin.



6.3.2.robots.txtExclusions

Ifserverauthenticationseemslikeoverkillandyou'drather

makeyourdirectoryorfilesavailabletoeveryoneexceptSlurp,

youcandosowitharobots.txtfile,whichindicateshowyou'd

likerobotstobehaveatyoursite.Well-behavedbots(suchas

Slurp)checkforrobots.txtbeforeindexinganything,tomake

surethey'reactingasthesiteownerwantsthemto.

Withrobots.txt,youcantellSlurpthatyou'dlikeittoexclude

certaindirectoriesorfilesfromitscrawl.Forexample,ifyou'd

likeSlurptoskipadirectorycalled/private,savethefollowing

linetoafilecalledrobots.txt:







User-agent:Slurp

Disallow:/private/



YoucanalsotellSlurptoskipspecificfiles:









User-agent:Slurp

Disallow:/Private.doc

Disallow:/Private.html



Onceyou'velistedallofthefilesanddirectoriesyou'dliketo

hide,addrobots.txttotherootdirectoryofyourwebsite,soit

hasaURLlikethis:





http://example.com/robots.txt



Ifahumanreadsyourrobots.txtfile,they'llseealistofthefilesand



directoriesyou'veaskedYahoo!nottoindex.Whilerobots.txtwillkeep

somebotsaway,itwon'tkeeppeoplefromviewingthefiles.Private

filesshouldalwaysbeplacedbehindserverauthenticationwherea

passwordisrequiredtoaccessthem.



Ifyou'dliketodenyentrytoallrobotsacrossallareasofyou

site,youcanuseawildcard,likethis:







User-agent:*

Disallow:/



Keepinmindthatonlybotsthatadheretotherobots.txt

standardwillplaybytherules.Peoplearefreetobuildbotsany

waytheywant,andsomeignorerobots.txtaltogether.Luckily,

Slurpwillalwaysplaybytherules.



6.3.3.robotsMetaTags

AnotherwaytoguidetheSlurpbotonapage-by-pagebasisas

itcrawlsyourwebsiteisthroughspecialHTMLmetatags.Meta

tagsaddextrainformationtoawebpageandarelocated

towardthetopofthepage,betweenthetags.To

keepSlurpfromindexingaparticularpage,addthefollowing

tag:









ThiswillinsurethatthepagewillnotshowupinYahoo!Search

results.Manywebcrawlerslookforthistag,andaddingthis

robotstagwillaffectmorethanYahoo!Othersearchengines,



suchasGoogle,willalsoskipthepage.

Ifyou'dlikesearchenginestoindexthepage,butnotkeepa

copyintheircache,youcanusethefollowingtag:









Usingthistagwillmeanyourpagewillshowupinsearch

results,butthesearchenginewillnotstoreacopyofthepage

thattheiruserscanview.Again,thiswillaffectmorethanjust

Yahoo!,becausemanysearchenginesalsoobeythistag.

NowthatyouknowhowtospeakSlurp'slanguage,youcan

makesurethatyourprivateorsemiprivateinformationdoesn't

turnupinYahoo!Searchresults,andyoucancontrolwhat

Yahoo!seesinthefirstplace.







Hack94.SearchYourWebSitewithYahoo!



Offeryourreadersawaytosearchyoursitewithout

hiringateamofdeveloperstobuildit.

Asaseriouswebaddict,I'mfrequentlyfrustratedbysitesthat

don'tofferawaytosearchtheircontent.Navigatingthrougha

mazeofdifferentsections,tryingtofindthatonepieceof

informationI'mafterfeelslikeawasteoftime.OftenI'mforced

toleavethesite,bringupYahoo!WebSearch,anduseitssite:

metakeyword.Thesite:shortcutletsyouspecifythatthe

searchresultsshouldbelimitedtoasingledomain.

Forexample,ifyoubrowsetomountaindew.com,you're

immediatelyblastedwithmetalmusicandanextremeFlash

animationshowingvariousextremesports.ButallI'mafteris

thenumberofcaloriesinacanofsoda,andthere'snota

searchforminsight.Icouldtryclickingonafewofthemenu

items,butI'dprobablyfindmoreimagesofskateboardingthan

usefulinformation.SoIsurfovertoYahoo!andtype:





site:mountaindew.comcalories



Inafewseconds,Ihaveaspecificlinkdeepwithinthe

mountaindew.comsitethathastheinformationI'mafter,as

showninFigure6-3.



Figure6-3.Searchresultsfor"calories"limitedto

mountaindew.com



PepsiCo,Inc.(theownersandoperatorsof

http://www.mountaindew.com)probablyhasthebudgetand

teamofdeveloperstobuilditsownsearchengineforthesiteif

itwantedto.Butmanyindividualsandsmallbusinessesdon't

havethatluxury.SinceYahoo!isprobablyalreadyindexingany

pagesyouhaveontheWeb,youcouldeasilygiveyourreaders

ashortcuttosearchingyoursitewithacustomYahoo!search

form.



6.4.1.TheCode

Inadditiontothesite:metakeyword,sometweakstoa

standardYahoo!WebSearchURLwillalsolimitresultstoa

singledomain.Here'sasearchURLthatwillreturnthesame

resultsasthepreviousexample:





http://search.yahoo.com/search?p=calories&vs=mountainde



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 93. Hide Part of Your Web Site from Yahoo!

Tải bản đầy đủ ngay(0 tr)

×