Managing eight-terabyte Telco SAS Warehouse and make use of
Transkript
Managing eight-terabyte Telco SAS Warehouse and make use of
Managing eight-terabyte Telco SAS Warehouse and make use of SPD Server for performance enhancements Esra Bayram TURKCELL The Premier GSM Operator in Turkey Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Table of Contents • • • • • • • • • About Turkcell Turkcell Data Mining Environment Turkcell SAS Warehouse Architecture Turkcell Data Mining Projects Turkcell SAS Warehouse ETL SPDS Server Configuration SPDS Server Security Model SPDS Server Enhancements SAS EM4.3 Performance Tips Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 About Turkcell Leading GSM operator with 28.7 million1 Among the top five mobile operators in Europe in terms of number of subscribers Capital expenditures of US$4.7 billion only in Turkey as of December 2005 Covers almost 100% of Turkey with over 10,000 base stations Among the leaders in Europe with respect to GPRS and MMS services offered to customers Global operator with international operations and worldwide roaming including international, GPRS and Camel 1. As of 31 March 2006 Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 About Turkcell Directly and indirectly creates employment for more than 20,000 people in Turkey alone The only Turkish company listed on the New York Stock Exchange (NYSE) Represented on the Board (21 members) and Executive Management Committee of the GSM Association (13 members) Promotes ISO9001, EU and US regulatory, financial and corporate governance standards throughout its operations Massive tax contribution to the Turkish government Socially responsible with extensive and internationally renowned educational, cultural and sports sponsorships Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Coverage Area As of end of March 2006, Turkcell covers 96.58% of Turkish population and 78.48% of the country’s geography. Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 INTERNATIONAL ROAMING Turkcell is among the top ten operators for international roaming and is the world’s #1 for GPRS roaming Oceans are also included in Turkcell roaming and GPRS roaming coverage through an agreement reached with MCP Norway in 2005 Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 As April 2006; International Roaming: 501 operators in 191 countries GPRS Roaming: 250 operators in 98 countries Active Camel Roaming: 112 operators in 68 countries Passive Camel Roaming: 119 operators in 68 countries International and International GPRS Roaming on Oceans Turkcell Services Turkcell offers products and services concurrently with global operators Mobile Payment Mobile e-mail Ring Back Tone Mobile Music Mobile Video Messenger Visual Radio Push to Talk WAP Blog Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Corporate Solutions By leveraging advancements in GSM technology, Turkcell is making mobile solutions an essential tool for business and government Blackberry Teleconference m-municipality m-government Messaging services Mobile marketing IVR Services Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Turkcell Data Mining Environment - 1 Data Mining EUL Data Mart (Viper) SAS EM5.3 SAS EG 3.0 SAS/ACCESS Interface to PC Files SAS/ETS Software SAS/STAT Software Data Mining ETL Data Mart (Miner) OLAP TOOL TURKCELL DATAWAREHOUSE (20TB) ETL EXTRACT / TRANSFORM / LOAD Customer Billing OPERATIONAL DATA Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Call Turkcell Data Mining Environment - 2 Data Mining ETL Data Mart(Miner) • • • • • • Sun Fire 4800 Hostname : Miner 12X1200MHz CPU (1200 MHz), 24 GB Memory 11.5 TB Disk Space (8.5 TB Full ) Development and Production(ETL) Environment • • • • • • • • • • • • • SAS Base 9.1.3 , SAS/Share, SAS/Stat, SAS/Access to Oracle, SAS/Access to Sybase, SAS/Warehouse Administrator, SAS/Connect, SAS/Enterprise Miner, SPDS 4.3 (Dev/Prod) SAS Integration Technologies, SAS/GRAPH, SAS/ETS, SAS/QC Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Turkcell Data Mining Environment - 3 Data Mining EUL Data Mart(Viper) • • • • • • • Sun V490 Hostname : Viper , 4X1500MHz CPU 16 GB Memory 1.6 TB Disk Space (1.1 TB Full ) Server Environment – SAS Enterprise Miner – SAS Integration Technologies – Enterprise Miner Server – Enterprise Miner Thin Client – SAS/ACCESS Interface to ORACLE User Environment – Windows – SAS EM5.2 – SAS EG 3.0 – SAS/ACCESS Interface to PC Files – SAS/ETS Software – SAS/STAT Software Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Turkcell SAS Warehouse Architecture-1 Development Environment • • • On Miner Server Different SPDS4.3 Server Different Libraries (tmp_dev, odd_dev, dm_dev, dev_user libraries) libname=tmp_dev pathname=/cadm18/SPDS4/meta/tmp_dev owner=sasspds roptions="metapath=('/cadm18/SPDS4/meta/tmp_dev') datapath=('/cadm17/SPDS4/data/tmp_dev' '/cadm18/SPDS4/data/tmp_dev' '/cadm19/SPDS4/data/tmp_dev') indexpath=('/cadm19/SPDS4/index/tmp_dev') • • • • • Different file systems for SPDS libraries Different Unix Users Logically separated from Production Environment Read permission on Production Libraries Different SPDS and SAS Work Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Turkcell SAS Warehouse Architecture-1 Production Environment • • • On Miner Server Different SPDS4.3 Server Different Libraries (tmptcell, oddtcell, dmtcell) libname=tmptcell pathname=/cadm13/SPDS4/meta/tmptcell owner=sasspds roptions="metapath=('/cadm13/SPDS4/meta/tmptcell') datapath=('/cadm24/SPDS4/data/tmptcell' '/cadm29/SPDS4/data/tmptcell' '/cadm35/SPDS4/data/tmptcell' '/cadm04/SPDS4/data/tmptcell' '/cadm12/SPDS4/data/tmptcell' '/cadm33/SPDS4/data/tmptcell') indexpath=('/cadm24/SPDS4/index/tmptcell' ‘ ‘/cadm29/SPDS4/index/tmptcell')" • • • • • Different file systems for SPDS libraries Different Unix User Logically separated from Development Environment Read Permission on Development Libraries Different SPDS and SAS Work Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Turkcell SAS Warehouse Architecture-2 End User Environment • • • On Viper Server, Windows SPDS4.3 Server is installed Each End user and EM Projects have their own libraries – 29 end users, 20 EM projects – libnames.parm libname=usr_a pathname=/cadm01/spdsmeta/usr_a owner=sasspds roptions="metapath=('/cadm01/spdsmeta/usr_a') datapath=('/cadm02/spdsdata/usr_a' '/cadm03/spdsdata/usr_a' '/cadm04/spdsdata/usr_a' ) indexpath=('/cadm01/spdsindex/usr_a')“ libname=prj_a pathname=/cadm01/spdsmeta/prj_a owner=sasspds roptions="metapath=('/cadm01/spdsmeta/prj_a') datapath=('/cadm02/spdsdata/prj_a' '/cadm03/spdsdata/prj_a' '/cadm04/spdsdata/prj_a' ) indexpath=('/cadm01/spdsindex/prj_a')“ • Read permission on Production and Development Libraries Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Turkcell Data Mining Projects • Payment Behavior Segmentation (Credit Scoring) • Application Scoring • Customer Value Score (CVS) • Churn Prediction • Needs Based Segmentation • Credit Limit Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Turkcell SAS Warehouse – ETL • LSF ( Screen shot) • How is the projects scheduled ? • How many processes are running ? • How is the new projects are added ? Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 LSF and SAS/Warehouse Administrator is used for job scheduling Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Appropriate Calendar is created in LSF Codes are copied from Development Env to Production Env Codes are defined in a SAS catalog Jobs are defined for SAS catalog programs in SAS/WA Process Editor Using LSF Scheduler Add-Ins jobs are sent to LSF Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Calendars used : Project Name AppMDL Credit Scoring CreditScoringUPS CreditScoringCPS AppDay Loyalty Churn YTS CoprCVS CVS CBS AppMon Credit Scoring Reporting Calendar Name miner_AppMDL miner_billing miner_billing miner_billing miner_daily miner_Loyalty miner_monthly miner_monthly miner_monthly miner_monthly miner_monthly miner_reporting miner_reporting Calendar Definition Application Scoring Model Development after 3rd billing date of each month after 3rd billing date of each month after 3rd billing date of each month daily at 01:00 pm Loyalty Run Time 5-8 th day of each month 5-8 th day of each month 5-8 th day of each month 5-8 th day of each month 5-8 th day of each month 1-5 th day of each month 1-5 th day of each month Period Updated Calendar Name Update the autoexec yearmonth Miner_Monthly miner_monthly YES miner_billmonth miner_billing YES miner_billreport miner_billreport miner_billrepmonth Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Extraction from Oracle: 104jobs, 130Hrs Transformation in SAS: 161 jobs, 120Hrs 68jobs, 120 Hrs 24jobs, 20Hrs miner_billrepmonth miner_reporting Concurrent ETL Sessions: 8 Upload from SAS: miner_billmonth Miner_Reporting ETL Period : 20 days DataMart in SAS: miner_billrepmonth Miner_Billing # of jobs : 357 LSF jobs YES SPD Server Configuration-1 • There are three SPD Servers for Development, Production and End User Layer Miner(ETL machine) Development SPDS4.3 spdsna 5180/tcp spdssn 5181/tcp Production SPDS4.3 spdsna43 5200/tcp spdssn43 5201/tcp Viper (EUL machine) EUL SPDS4.3 spdsna43 5200/tcp spdssn43 5201/tcp Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 #SPDS name server port for SAS9 #SPDS Snet server port for SAS9 #SPDS name server port for SAS9 SPDS4.3 #SPDS Snet server port for SAS9 SPDS4.3 #SPDS name server port for SAS9 SPDS4.3 #SPDS Snet server port for SAS9 SPDS4.3 SPD Server Configuration-2 Production spdsserv.parm Development spdsserv.parm EUL spdsserv.parm SORTSIZE=2048M; INDEX_SORTSIZE=512M; GRPBYROWCACHE=256M; BINBUFSIZE=32K; INDEX_MAXMEMORY=30M; WORKPATH="('/cadm03/SpdsW ork4.3‘)”; NOCOREFILE; SEQIOBUFMIN=64K; RANIOBUFMIN=4K; MAXWHTHREADS=32; MAXSEGRATIO=75; WHERECOSTING; RANDOMPLACEDPF; SORTSIZE=2048M; INDEX_SORTSIZE=128M; GRPBYROWCACHE=128M; BINBUFSIZE=32K; INDEX_MAXMEMORY=30M; WORKPATH="('/cadm38/SpdsWo rkDev4.3')"; NOCOREFILE; SEQIOBUFMIN=64K; RANIOBUFMIN=4K; MAXWHTHREADS=32; MAXSEGRATIO=75; WHERECOSTING; RANDOMPLACEDPF; MINPARTSIZE=2048M; SORTSIZE=2048M; INDEX_SORTSIZE=512M; GRPBYROWCACHE=256M; BINBUFSIZE=32K; INDEX_MAXMEMORY=30M; WORKPATH="('/cadm01/Spds Work4.3' )"; NOCOREFILE; SEQIOBUFMIN=64K; RANIOBUFMIN=4K; MAXWHTHREADS=16; MAXSEGRATIO=75; WHERECOSTING; RANDOMPLACEDPF; MINPARTSIZE=512M; MINPARTSIZE=2048M; Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Security Model-1 • Unix user is created first. – useradd –u userid username • This user must be added to the SPDS User list. – Psmgr • add yyucel SASpw9 SASpw9 3 - - SUPPORT - • chgpass yyucel SASpw9 SASpw2 SASpw2 • A macro is written to assign security an easy way it accepts two parameters, library to be secured and users who have write access all the rest will have read access • Users are login with their Unix users password, and assign the library with the spds user created the same as unix user. libname tmp_dev sasspds 'tmp_dev' server=miner.spdsna43 user="&SYSUSERID" passwd="SASpw2" IP=YES; Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Security Model-2 %macro libsecure(_libname=, _users=); ........................................................... proc SPDO lib=&_libname; set ACLTYPE DATA; set acluser &user; delete acl _all_; %if &user=sasadm %then add acl/libname persist; add acl /generic persist; modify acl _all_/read nowrite noalter nocontrol; .......................................................... %do ind2=1 %to &numofusers; %let userpass = %sysfunc(scan(&_users,&ind2,' ')); &userpass=(y,y,y,y) %end; /*do loop*/ quit; ...................................................... %mend; libname tmp_dev sasspds 'tmp_dev' server=miner.spdsprod user="sasadm" passwd="SASpw2" unixdomain=yes netcomp=no aclspecial=yes; %libsecure(_libname=tmp_dev, _users=ebayram sasprod churnusr sasdev hsari ukaradag); Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Security Model-3 Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Enhancements-1 • MINPARTSIZE=2GB Most of the tables are bigger than 2GB. MINPARTSIZE was 128M before. Increasing to 2GB, result in less data partitions for a table and decrease the number of I/O requests. Test: Reading from a table having 22 partitions was 100% faster than reading from the same table having only 351 partitions. • SORTSIZE Increased from 48MB to 2048MB Few MERGE jobs would benefit from the extra memory allocated for SORT. The machine has 24GB of RAM. • The MAXWHTHREAD server parameter was set to 64 threads. During scalability tests, it was determined that 32 threads were adequate. By cutting the number by half, This would prevent the server from over threading and further reduce thread contention during queries. Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Enhancements-2 126 127 128 129 130 131 132 133 134 135 136 137 data &perm_tmp_lib..Corporate_Indicators30 ; merge &perm_tmp_lib..Corporate_Indicators(in=in1) &perm_tmp_lib..SegmentIndicators_&RefPeriod &perm_tmp_lib..ScoreIndicators_&RefPeriod &perm_tmp_lib..NumDiffBnoIndicators_&RefPeriod ; by Party_id; if in1; run; NOTE: There were 170660 observations read from the data set TMPTCELL.CORPORATE_INDICATORS. NOTE: There were 35093410 observations read from the data set TMPTCELL.SEGMENTINDICATORS_200508. NOTE: There were 27819995 observations read from the data set TMPTCELL.SCOREINDICATORS_200508. NOTE: There were 38038783 observations read from the data set TMPTCELL.NUMDIFFBNOINDICATORS_200508. NOTE: The data set TMPTCELL.CORPORATE_INDICATORS30 has 170660 observations and 1412 variables. NOTE: Compressing data set TMPTCELL.CORPORATE_INDICATORS30 decreased size by 63.23 percent. NOTE: DATA statement used (Total process time): real time 1:19:12.66 cpu time 56:52.57 Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Enhancements-3 140 proc sql; 141 create table &perm_tmp_lib..Corporate_Indicators31 as select a.*, b.*,c.* ,d.* 142 from &perm_tmp_lib..Corporate_Indicators a 143 left join &perm_tmp_lib..SegmentIndicators_&RefPeriod b on a.party_id=b.party_id 144 left join &perm_tmp_lib..ScoreIndicators_&RefPeriod c on a.party_id=c.party_id 145 left join &perm_tmp_lib..NumDiffBnoIndicators_&RefPeriod d on a.party_id=d.party_id 146 ; NOTE 49-169: The meaning of an identifier after a quoted string may change in a future SAS release. Inserting white space between a quoted string and the succeeding identifier is recommended. NOTE 49-169: The meaning of an identifier after a quoted string may change in a future SAS release. Inserting white space between a quoted string and the succeeding identifier is recommended. NOTE: Physical Name: TMPTCELL NOTE: Compressing data set TMPTCELL.CORPORATE_INDICATORS31 decreased size by 63.23 percent. NOTE: Table TMPTCELL.CORPORATE_INDICATORS31 created, with 170660 rows and 1412 columns. 147 quit; NOTE: PROCEDURE SQL used (Total process time): real time 25:30.96 cpu time 55.61 seconds Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Enhancements-4 177 178 proc sql; create table &perm_tmp_lib..Corporate_Indicators2 as select a.*, b.*,c.* ,d.*,e.*,f.*,g.*,h.*,j.*,k.*,l.*,m.*,n.* 179 from &perm_odd_lib..Corporate_Indicators a 180 left join &perm_tmp_lib..SegmentIndicators_&RefPeriod b on a.party_id=b.party_id 181 left join &perm_tmp_lib..ScoreIndicators_&RefPeriod c on a.party_id=c.party_id 182 left join &perm_tmp_lib..NumDiffBnoIndicators_&RefPeriod d on a.party_id=d.party_id 183 left join &perm_tmp_lib..SekreterCellIndicators_&RefPeriod e on a.party_id=e.party_id 184 left join &perm_tmp_lib..ServiceIndicators_&RefPeriod f on a.party_id=f.party_id 185 left join &perm_tmp_lib..TelephoneNumHistInd_&RefPeriod g on a.party_id=g.party_id 186 left join &perm_tmp_lib..AccountStatsIndicators_&RefPeriod h on a.party_id=h.party_id 187 left join &perm_tmp_lib..Postpaidpayment_&RefPeriod j on a.party_id=j.party_id 188 left join &perm_tmp_lib..Freeminuteindicators_&RefPeriod k on a.party_id=k.party_id 189 left join &perm_tmp_lib..Freeminuteindicators2_&RefPeriod l on a.party_id=l.party_id 190 left join &perm_tmp_lib..MergeCallStatsIndicators_&RefPeriod m on a.party_id=m.party_id 191 left join &perm_tmp_lib..Loyaltyindicators_&RefPeriod n on a.party_id=n.party_id 192 ; NOTE: PROCEDURE SQL used (Total process time): real time 1:16:48.51 cpu time 1:15.06 Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SPD Server Enhancements-5 • Adding option IP=YES to all SPD Server libname statements. The option would enable implicit SQL pass-through Counting the rows of a table with proc sql is very fast now. In SAS EM 4.3 also, when you choose a table from Input Data Source, all the rows are counted. For a 4GB table, it lasts 2 minutes to count the rows before, but 2 seconds after the option is set. This option will increase the speed of choosing a table from Input Data Source node in case of counting the rows. Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 SAS EM4.3 Performance Tips-1 • Duration of opening an EM4.3 project is too long. • Duration of listing the libraries in IDS is too long. We decreased the number of libraries from 67 to 26 Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 Number of Libraries Duration to open the EM4.3 project (sec) Duratio to list the libraries in IDS (sec) 3 42 9 10 50-70 17 15 50-70 18 25 100-120 42 35 100-130 41 45 100-130 46 55 100-130 54 67 100-130 63 SAS EM4.3 Performance Tips-2 #of TMPTCELL Libraries (769 tables) • Duration of listing the tables in IDS is too long. Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 EERDAL (42 tables) 3 120 5 10 131 5 15 87 6 25 99 4 35 92 5.5 45 86 6 55 96 5.5 67 96 6 SAS EM4.3 Performance Tips-3 # of Records • Duration of opening a table in IDS is too long. Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 1,200,000 1,400,000 1,600,000 1,800,000 2,000,000 2,159,786 2,159,786 2,159,786 2,159,786 2,159,786 2,159,786 2,159,786 2,159,786 2,159,786 # of Columns 564 564 564 564 564 564 564 564 564 564 564 564 564 564 564 50 100 150 200 250 300 350 400 564 Table Name Size of Table (MB) Duration to open a table in IDS(secs) ODDTCELL.TEST100bin ODDTCELL.TEST200bin ODDTCELL.TEST300bin ODDTCELL.TEST400bin ODDTCELL.TEST500bin ODDTCELL.TEST600bin ODDTCELL.TEST700bin ODDTCELL.TEST800bin ODDTCELL.TEST900bin ODDTCELL.TEST1000bin ODDTCELL.TEST1200bin ODDTCELL.TEST1400bin ODDTCELL.TEST1600bin ODDTCELL.TEST1800bin ODDTCELL.TEST2000bin ODDTCELL.NTEST50 ODDTCELL.NTEST100 ODDTCELL.NTEST150 ODDTCELL.NTEST200 ODDTCELL.NTEST250 ODDTCELL.NTEST300 ODDTCELL.NTEST350 ODDTCELL.NTEST400 CKOYLU.NEEDS_FINAL 402 803 1,205 1,606 2,008 2,410 2,811 3,213 3,614 4,016 4,819 5,622 6,425 7,229 8,032 788 1,606 2,430 3,279 4,185 4,663 5,487 6,265 8,673 0.3 0.6 1.0 1.3 2.0 2.7 3.1 2.6 3.2 3.6 4.1 5.2 5.3 6.0 7.3 0.5 1.1 1.9 2.7 6.4 2.8 6.0 4.6 7.1 THANK YOU Corporate Communications Turkcell Genel Sunum Ing 280206 28.02.2006