Tải bản đầy đủ - 0 (trang)
Hack 21. Track Additions to Yahoo!

Hack 21. Track Additions to Yahoo!

Tải bản đầy đủ - 0trang

You'reinterestedintrendtracking

Whichcategoriesareconsistentlybusy?Whichareallbut

dead?BywatchinghowYahoo!addssitestocategories,

overtimeyou'llgetasenseoftherhythmsandtrendsand

detectwhenunusualactivityoccursinacategory.

ThishackscrapestherecentcountsofadditionstoYahoo!

categoriesandprintsthemout,providinganat-a-glancelookat

additionstovariouscategories.You'llalsogetatab-delimited

tableofhowmanysiteshavebeenaddedtoeachcategoryfor

eachday.Atab-delimitedfileisexcellentforimportingintoa

spreadsheet,whereyoucanturnthecountnumbersintoa

chart.



1.22.1.TheCode

Savethefollowingcodetoafilecalledhoocount.pl:





#!/usr/bin/perl-w













usestrict;

useDate::Manip;

useLWP::Simple;

useGetopt::Long;







$ENV{TZ}="GMT"if$^Oeq"MSWin32";





















#thehomepageforYahoo!'s"What'sNew".

my$new_url="http://dir.yahoo.com/new/";



#themajorcategoriesatYahoo!.hash'dbecause

#we'llusethemtoholdourcountsstring.

my@categories=("Arts&Humanities",

"Busine







"Computers&Internet",







"Entertainment",























"Health",











"Recreation&Sports",







"Regional",









"SocialScience",



my%final_counts;#wherewesaveourfinalreadouts.











#loadinouroptionsfromthecommandline.

my%opts;GetOptions(\%opts,"c|count=i");

dieunless$opts{c};#countsitesfrompast$idays.











#ifwe'vebeentoldtocountthenumberofnewsites,

#thenwe'llgothrougheachofourmaincategories

#forthelast$idaysandcollatearesult.











#begintheheader

#forourimportfile.

my$header="Category";























#fromtoday,goingbackwards,get$idays.

for(my$i=1;$i<=$opts{c};$i++){















#createaData::Maniptimethatwill

#beusedtoconstructthelast$idays

my$day;#queryforYahoo!retrieval.

if($i==1){$day="yesterday";}

else{$day="$idaysago";}

my$date=UnixDate($day,"%Y%m%d");











































#andthisdateto

#ourimportfile.

$header.="\t$date";

#anddownloadtheday.

my$url="$new_url$date.html";

my$data=get($url)ordie$!;

#andloopthrougheachofourcategories.























}



my$day_count;foreachmy$category(sort@cate



$data=~/$category.*?(\d+)/;my$count



$final_counts{$category}.="\t$count";

}

















#withallourcountsfinished,

#printoutourfinalfile.

print$header."\n";

foreachmy$category(@categories){



print$category,$final_counts{$category},"\n"

}



1.22.2.RunningtheHack

Theonlyargumentyouneedtoprovidetothescriptisthe

numberofdaysbackyou'dlikeittotravelinsearchofnew

additions.SinceYahoo!doesn'tarchiveits"newpagesadded"

indefinitely,asafeupperlimitisaroundtwoweeks.Here,we're

lookingatthepasttwodays:



























%perlhoocount.pl--count2

Category



20050711

Arts&Humanities





Business&Economy





Computers&Internet



30

Education





0

Entertainment

77



Government





2

Health 11





0

News&Media

0





Recreation&Sports





Reference





0





32

44











2005071





0



0





0

48





















Regional



Science6



SocialScience

Society&Culture







0





81

9

















12



0





1.22.3.HackingtheHack

Ifyou'renotonlyaresearcherbutalsoaYahoo!observer,you

mightbeinterestedinhowthenumberofsitesaddedchanges

overtime.Tothatend,youcouldrunthisscriptundercronor

theWindowsSchedulerandoutputtheresultstoafile.After

threemonthsorso,you'dhaveaprettyinterestingsetof

countstomanipulatewithaspreadsheetprogram.

KevinHemenwayandTaraCalishain







Hack22.Yahoo!DirectoryMindshareinGoogle



HowdoeslinkpopularitycompareinYahoo!'ssearchable

subjectindexversusGoogle'sfull-textindex?Findoutby

calculatingmindshare!

Yahoo!andGooglearetwoverydifferentanimals.Yahoo!

indexesonlyasite'smainURL,title,anddescription,while

Googlebuildsfull-textindexesofentiresites.Surelythere's

someinterestingcross-pollinationwhenyoucombineresults

fromthetwo.

ThishackscrapesalltheURLsinaspecifiedsubcategoryofthe

Yahoo!directory.ItthentakeseachURLandgetsitslinkcount

fromGoogle.Eachlinkcountprovidesanicesnapshotofhowa

particularYahoo!categoryanditslistedsitesstackuponthe

popularityscale.



What'salinkcount?It'ssimplythetotalnumberofpagesinGoogle's

indexthatlinktoaspecificURL.



Thereareacoupleofwaysyoucanuseyourknowledgeofa

subcategory'slinkcount.IfyoufindasubcategorywhoseURLs

haveonlyafewlinkseachinGoogle,youmayhavefounda

subcategorythatisn'tgettingalotofattentionfromYahoo!'s

editors.Considergoingelsewhereforyourresearch.Ifyou'rea

webmasterandyou'reconsideringpayingtohaveYahoo!add

youtoitsdirectory,runthishackonthecategoryinwhichyou

wanttobelisted.Aremostofthelinksreallypopular?Ifthey



are,areyousureyoursitewillstandoutandgetclicks?Maybe

youshouldchooseadifferentcategory.

WegotthisideafromasimilarexperimentdonebyJonUdell

(http://weblog.infoworld.com/udell)in2001.HeusedAltaVista

insteadofGoogle;see

http://udell.roninhouse.com/download/mindshare-script.txt.We

appreciatetheinspiration,Jon!



1.23.1.TheCode

YouwillneedaGoogleAPIaccount(http://api.google.com)as

wellasthePerlmodulesSOAP::Lite(http://www.soaplite.com)

andHTML::LinkExtor(http://search.cpan.org/author/GAAS/HTMLParser/lib/HTML/LinkExtor.pm)torunthefollowingcode.You'll

alsoneedacopyoftheGoogleWSDLfileinthesamedirectory

asthescript(http://api.google.com/GoogleSearch.wsdl).Save

thefollowingcodetoafilecalledmindshare.pl:















#!/usr/bin/perl-w













my$google_key="yourAPIkeygoeshere";

my$google_wdsl="GoogleSearch.wsdl";

my$yahoo_dir=shift||"/Computers_and_Internet/Data_







"eXtensible_Markup_Language_/RS









#downloadtheYahoo!directory.

my$data=get("http://dir.yahoo.com".$yahoo_dir)or









#createourGoogleobject.

my$google_search=SOAP::Lite->service("file:$google_w



usestrict;

useLWP::Simple;

useHTML::LinkExtor;

useSOAP::Lite;



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 21. Track Additions to Yahoo!

Tải bản đầy đủ ngay(0 tr)

×