本文为您介绍如何构建业务数据及Dataphin的计算引擎源。

背景信息

通常,您的业务数据可以通过创建同步任务或管道任务的方式,导入Dataphin平台进行数据建模及管理。本教程为了让您快速熟悉智能数据构建与质量管理功能的流程,以TPC-DS中的店铺销售场景为例构建业务数据。

构建业务数据

本教程以MySQL数据源为例,且数据源中包含storecustomerstore_sales数据表,数据表的结构如下:
  • 店铺维度的数据表(store)。
    字段 类型
    s_store_sk bigint
    s_store_id string
    s_rec_start_date string
    s_rec_end_date string
    s_closed_date_sk bigint
    s_store_name string
    s_number_employees int
    s_floor_space int
    s_hours string
    S_manager string
    S_market_id int
    S_geography_class string
    S_market_desc string
    s_market_manager string
    s_division_id int
    s_division_name string
    s_company_id int
    s_company_name string
    s_street_number string
    s_street_name string
    s_street_type string
    s_suite_number string
    s_city string
    s_county string
    s_state string
    s_zip string
    s_country string
    s_gmt_offset double
    s_tax_percentage double
  • 顾客维度的数据表(customer)。
    字段 类型
    c_customer_sk bigint
    c_customer_id string
    c_current_cdemo_sk bigint
    c_current_hdemo_sk bigint
    c_current_addr_sk bigint
    c_first_shipto_date_sk bigint
    c_first_sales_date_sk bigint
    c_salutation string
    c_first_name string
    c_last_name string
    c_preferred_cust_flag string
    c_birth_day int
    c_birth_month int
    c_birth_year int
    c_birth_country string
    c_login string
    c_email_address string
    c_last_review_date_sk bigint
  • 事实数据表(store_sales)。
    字段 类型
    ss_sold_date_sk bigint
    ss_sold_time_sk bigint
    ss_item_sk bigint
    ss_customer_sk bigint
    ss_cdemo_sk bigint
    ss_hdemo_sk bigint
    ss_addr_sk bigint
    ss_store_sk bigint
    ss_promo_sk bigint
    ss_ticket_number bigint
    ss_quantity int
    ss_wholesale_cost double
    ss_list_price double
    ss_sales_price double
    ss_ext_discount_amt double
    ss_ext_sales_price double
    ss_ext_wholesale_cost double
    ss_ext_list_price double
    ss_ext_tax double
    ss_coupon_amt double
    ss_net_paid double
    ss_net_paid_inc_tax double
    ss_net_profit double
向数据表写入数据:
  • 执行以下命令,向数据表customer写入数据。
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0001',10001,20001,00001,20200618,20200616,'aaaa','xiaoming','wang','fuzhuang',15,3,1990,'china','hangzhou','wangxiaoming@126.com',20200701);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0002',10002,20002,00002,20200618,20200616,'aaaa','sanxiao','wang','shipin',26,3,1989,'china','xian','wangsanxiao@126.com',20200630);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0003',10003,20003,00002,20200619,20200616,'aaaa','xiaomao','liang','fuzhuang',25,3,1998,'china','hangzhou','liangxiaomao@126.com',20200701);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0004',10004,20004,00001,20200619,20200617,'aaaa','maomao','zhang','shipin',15,4,1991,'china','xian','zhangmaomao@126.com',20200630);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0005',10005,20005,00001,20200618,20200617,'aaaa','liying','zhao','fuzhuang',15,5,1994,'china','hangzhou','zhaoliying@126.com',20200701);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0006',10006,20006,00003,20200620,20200616,'aaaa','duoduo','liang','bangongyongpin',24,3,1992,'china','xian','liangduoduo@126.com',20200630);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0007',10007,20007,00001,20200620,20200617,'aaaa','manling','zhao','jiaju',16,6,1993,'china','hangzhou','zhaomanling@126.com',20200630);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0008',10008,20008,00004,20200618,20200617,'aaaa','shuaidai','wang','fuzhuang',17,6,1995,'china','hangzhou','wangshuaidai@126.com',20200629);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c0009',10009,20009,00005,20200619,20200617,'aaaa','fangfang','zhou','shipin',18,6,1993,'china','xian','zhoufangfang@126.com',20200630);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c00010',100010,200010,00004,20200619,20200617,'aaaa','xiaoxiao','wang','fuzhuang',15,7,1996,'china','hangzhou','wangxiaoxiao@126.com',20200629);
    insert into customer(c_customer_sk,c_customer_id,c_current_cdemo_sk ,c_current_hdemo_sk,c_current_addr_sk ,c_first_shipto_date_sk ,c_first_sales_date_sk,c_salutation ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_day ,c_birth_month ,c_birth_year ,c_birth_country ,c_login ,c_email_address ,c_last_review_date_sk ) values(11111,'c00011',100011,2000111,00005,20200618,20200617,'aaaa','yiyi','wang','shipin',19,6,1990,'china','xian','wangyiyi@126.com',20200628);
  • 执行以下命令,向数据表store写入数据。
    insert into store (store_sk,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10001,'c0001','20200618','20200619',20200620,'A',15,10,'2','zhangsan',10001,'1001','shipin','lisi',20001,'ganguo',22001,'A01','S01','tangyanlu','B01','C01','hangzhou','china','success','D01','china','3.1','1.8') ;
    insert into store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10002,'c0002','20200519','20200520',20200520,'B',15,10,'3','zhangxiaomao',10002,'1002','shipin','zhangsan',20002,'ganguo',22002,'A02','S02','tangyanlu','B02','C02','hangzhou','china','success','D02','china','3.2','1.9') ;
    insert into store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10003,'c0003','20200520','20200521',20200520,'A',15,10,'4','zhangmao',10003,'1003','fuzhuang','zhangsi',20003,'nvzhuang',22003,'A03','S03','yananlu','B03','C03','hangzhou','china','success','D03','china','3.3','2.0') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10004,'c0004','20200519','20200520',20200520,'B',15,10,'5','zhangmaomao',10004,'1004','shipin','zhangmaomao',20002,'ganguo',22002,'A04','S04','luoyanglu','B05','C04','hangzhou','china','success','D04','china','3.4','2.1') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10005,'c0005','20200517','20200518',20200520,'B',15,10,'6','zhangmaomao',10005,'1005','fuzhuang','zhangmaomao',20002,'nanzhuang',22002,'A05','S05','luoyanglu','B06','C04','hangzhou','china','success','D05','china','3.5','2.3') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10005,'c0005','20200515','20200520',20200520,'B',15,10,'7','zhaomaomao',10006,'1006','shipin','zhangmaomao',20002,'ganguo',22002,'A06','S06','jingyelu','B07','C04','hangzhou','china','success','D06','china','3.6','2.1') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10007,'c0007','20200515','20200519',20200520,'B',15,10,'8','zhaoduoduo',10007,'1007','shipin','zhangmaomao',20002,'ganguo',22002,'A07','S07','luoyanglu','B08','C04','xian','china','success','D07','china','3.7','2.2') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10008,'c0008','20200514','20200518',20200520,'B',15,10,'9','zhangsanmiao',10008,'1008','fuzhuang','zhangmaomao',20002,'nvzhuang',22002,'A08','S08','luoyanglu','B09','C04','xian','china','success','D08','china','3.8','2.1') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(10009,'c0009','20200515','20200517',20200521,'B',15,10,'10','zhangmaoyi',10009,'1009','shipin','zhangmaomao',20002,'ganguo',22002,'A09','S09','jingyelu','B04','C10','xian','china','success','D09','china','3.9','2.4') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(100010,'c00010','20200513','20200516',20200523,'B',15,10,'11','xumaomao',10010,'1010','shipin','zhangmaomao',20002,'ganguo',22002,'A10','S10','luoyanglu','B04','C11','hangzhou','china','success','D10','china','4.2','2.5') ;
    insert into  store (store_sk ,store_id ,s_rec_start_date ,s_rec_end_date ,s_closed_date_sk ,store_name ,s_number_employees ,s_floor_space ,s_hours ,S_manager ,S_market_id ,S_geography_class ,S_market_desc ,s_market_manager ,s_division_id ,s_division_name ,s_company_id ,s_company_name ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip ,s_country ,s_gmt_offset ,s_tax_percentage )values(100011,'c00011','20200519','20200516',20200520,'B',15,10,'12','hangogo',10011,'1011','shipin','zhangmaomao',20002,'ganguo',22002,'A11','S11','sanqiao','B04','C12','hangzhou','china','success','D11','china','3.4','2.6') ;
  • 执行以下命令,向数据表store_sales写入数据。
    insert into table store_sales partition (ds='${bizdate}')values(20200518,2030,30001,11111,10001,20001,00001,10001,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales   partition (ds='${bizdate}')values(20200519,2130,30002,11112,10002,20002,00002,10002,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200519,2030,30003,11111,10003,20003,00002,10003,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200520,2230,30004,11112,10004,20004,00001,10004,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200519,2030,30005,11113,10005,20005,00001,10005,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200520,2030,30006,11112,10006,20006,00003,10006,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200519,2230,30007,11111,10007,20007,00001,10007,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200520,2030,30008,11112,10008,20008,00004,10008,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200519,2130,30009,11113,10009,20009,00005,10009,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200519,2030,30010,11111,100010,200010,00004,100010,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);
    insert into store_sales partition (ds='${bizdate}')values(20200519,1930,30011,11112,100011,200011,00005,100011,12,10,20,3.3,68.8,58.5,4.8,3.5,5.8,6.8,5.6,4.2,2.3,3.3,9.8);

构建Dataphin的计算引擎

创建MaxCompute(ODPS)项目(dqe_demo_devdqe_demo_prod),详情请参见创建工作空间
说明
  • Prod项目计算源对应的MaxCompute(ODPS)项目名称与Dataphin新建的项目名称(dqe_demo_prod)保持一致。
  • MaxCompute(ODPS)项目(dqe_demo_devdqe_demo_prod)分别连接到Dataphin中的Dev项目和Prod项目。