Home > News content

GitHub dry goods, big data competition Top solution open source summary

via:博客园     time:2018/11/16 22:39:45     readed:473

Nowadays, more and more enterprises, universities and academic organizations are organizing various types of data competitions to "identify" outstanding talents in the field of data science, and to encourage them to find breakthrough solutions for a certain data field or application scenario, and to leave valuable information for future data researchers. Experience.

SmilexuhcIn the GitHub community, the top solutions for data contests were sorted out, including Top solutions for pure data contests and data contests in the field of natural language processing (NLP). Small partners who are interested in these events can come together to see this full dry sticker.

Pure data contest

OneThe 2018 major flight AI marketing algorithm competition

Participants were asked to predict the user's ad click probability using artificial intelligence technology based on mass advertising data from the iFLYTEK AI marketing cloud. The competition provides five types of data, including basic advertising data, advertising material information, media information, user information and context information, A total of 1001650 preliminary test data and 1998350 trial data (retrial training data: preliminary data retrial data).

Rank1:Https://zhuanlan.zhihu.com/p/47807544

Two2018 IJCAI Ali mother search advertising transformation forecast

This competition requires participants to take Ali E-commerce advertisements as the research object, based on the massive real transaction data provided by Taobao platform, to predict the purchasing intention of users through artificial intelligence technology construction. This competition provides participants with five types of data, including basic data, advertising commodity information, user information, context information and store information. The data used in the preliminary contest contains samples of several days; the data of the last day is used to evaluate the results, which are not disclosed to the competitors; and the data of the remaining days are provided to the competitors as training data.

Three2018 Tencent advertising algorithm competition

The title of this algorithm contest comes from an advertising technology product based on real business scenarios

Considering the security assurance of business data, all data provided by the competition are desensitized data. The whole data set is divided into training set and test set: the training set calibrates the users who belong to the seed package and those who do not belong to the seed package (i.e. positive and negative samples). The test set will test whether the algorithm of the competitors can accurately calibrate whether the users in the test set belong to the corresponding seed package, the training set and the test set correspond to each other. Seed bags are identical. The seed packages provided by the preliminary and semi-finals are different except for the order of magnitude.

Four.2018 University big data Challenge

The competition requires participants to predict active users in the future based on desensitization and sampled data. Teams need to design corresponding algorithms for data analysis and processing, and the results of the competition are evaluated and ranked using online evaluation data according to the designated evaluation indicators. The data provided by the contest are user behavior data after desensitization and sampling. The date information is numbered uniformly. The first day is 01, the second day is 02. By analogy, tab segmentation is used in all files.

Five.2018JDATA user purchase time forecast

This competition requires participants to design their own data processing operation and training models based on the given users who have purchased the target commodity in the past three months and their data information of browsing, purchasing and evaluating in the previous year, so as to predict the most likely users to purchase the target commodity in the next month and predict them. Examine the first purchase date in the time period. Data mainly includes user basic information, SKU basic information, user behavior information, user order information and evaluation information.

Rank9:Https://zhuanlan.zhihu.com/p/45141799

Six2018 DF fan blade cracking warning

Based on the real-time data of fan SCADA, the participants are required to establish the early fault detection model of blade cracking through machine learning, in-depth learning, statistical analysis and other methods, so as to give early warning of blade cracking fault. The data set provided by the competition includes training set and test set: there are 40,000 samples of 25 types of fans in the training set, and 80,000 samples without fan number in the test set.

Rank2:Https://github.com/SY575/DF-Early-warning-of-the-wind-power-system

Seven2018 DF photovoltaic power generation forecast

Based on the analysis of the principle of photovoltaic power generation, the contestants are required to demonstrate the factors that affect the output power of photovoltaic, such as irradiance and working temperature of photovoltaic panels. A prediction model is established by real-time monitoring of the operating state parameters and meteorological parameters of photovoltaic panels to predict the instantaneous power generation of photovoltaic power plants, and according to the DCS system of photovoltaic power plants. The actual generation data are compared and analyzed to verify the practical application value of the model.

The competition provides 9,000 training points and 8,000 test sets, including photovoltaic panel operating state parameters (solar panel backplane temperature, the voltage and current of its photovoltaic array) and meteorological parameters (solar irradiance, ambient temperature and humidity, wind speed, wind direction, etc.).

Rank1:https://zhuanlan.zhihu.com/p/44755488?utm_source=qq(this scheme can also be viewed in WeChat: < high="" score="" model="" scheme="" in="" a="" xgboost="" lightgbm="" lstm:="" machine="" learning="" contest]="" />

8. AI Global Challenger Competition

This competition requires participants to establish an accurate risk control model based on the basic identity information, consumer behavior, bank repayment and other data information of nearly 70,000 loan users provided by the immediate financial platform to predict whether users will overdue repayment.

Rank1:Https://github.com/chenkkkk/User-loan-risk-prediction

Nine.2016 financial 360- user loan risk prediction

The competition requires participants to establish an accurate risk control model based on the basic identity information, consumer behavior and bank repayment data of nearly 70,000 loan users provided by Rong360 and financial institutions on the platform to predict whether the users will overdue repayment.

Rank7:Https://github.com/hczheng/Rong360

Ten.2016 CCF-020 coupon usage forecast

The competition requires participants to predict whether users will cancel the coupon within 15 days after receiving it in July 2016, based on the real online and offline consumption behavior of a given user between January 1, 2016 and June 30, 2016. AUC is used to evaluate the competition. First, the AUC value of each coupon is calculated separately, and then the AUC value of all coupons is averaged as the final evaluation standard.

Rank1:Https://github.com/wepe/O2O-Coupon-Usage-Forecast

Eleven.2016 CCF- agricultural product price forecast

The competition requires participants to forecast the price of agricultural products in July based on the price data of agricultural products before June 2016. The preliminary competition of this topic is based on the price data of farm commodity markets in China, while the second competition is based on the weather and other multi-source data.

Rank2:Https://github.com/xing89qs/CCF_Product

Twelve2016 CCF- customer power consumption exception

The State Grid monitors the abnormalities of users and their transformers, and conducts spot checks on users according to abnormal conditions by field maintenance personnel. The results of the checks are fed back. If it is found that the users are stealing electricity, the information of the users will be fed back. In this contest, participants are required to establish a detection model for electricity theft and identify the user's electricity theft behavior by providing relevant data and the results of inspectors'inspection.

Rank4:Https://github.com/AbnerYang/2016CCF-StateGrid

Thirteen.2016 CCF- Sogou user portrait competition

In the preliminary competition of this topic, participants are required to analyze the search keywords of another 20,000 people based on the million-level search terms given by 20,000 users and the training set of genuine gender, age and educational background obtained from the survey. The classification algorithm is constructed by machine learning and data mining technology, and the search keywords of another 20,000 people are analyzed, and their gender and year are given. Age, academic qualifications and other user attribute information. During the rematch, the scale of the training set and the test set were extended to 100 thousand users.

OneFour2016 user trajectories of CCF- Unicom

Precision marketing is a new direction of Internet marketing and advertising marketing. Especially when users are in specific locations and businesses, how to match users according to user portraits and push corresponding preferential and advertising information through different channels has become a new developer of many Internet and non-Internet enterprises. To. Taking one of the marketing scenarios as an example, the contestants are required to complete user portrait description and merchant matching based on user location information, merchant classification and location information.

RankX:Https://github.com/xuguanggen/2016CCF-unicom

OneFive2016 CCF-Human or Robots

In the first half of 2016, AdMaster's anti-cheating solution identified an average of up to 28% of false traffic per day, i.e. non-human malicious traffic caused by robotic simulation and black IP. This contest requires participants to automatically detect these false traffic flow through user behavior logs.

Rank6:Https://github.com/pickou/ccf_human_or_robot

SixteenRookie demand forecasting and split warehouse planning

In this competition, participants are required to forecast the national and regional demand of a commodity in the next two weeks based on the data of a large number of buyers and sellers in the past year. Participants need to use data mining technology and methods to accurately depict the changing law of commodity demand, predict the future national and regional demand, and take into account the impact of future uncertainties on logistics costs, so as to achieve global optimization. The competition provides national and regional warehousing data for goods from October 10, 2014 to December 27, 2015.

Rank6:Https://github.com/wepe/CaiNiao-DemandForecast-StoragePlaning

Rank10:Https://github.com/xing89qs/TianChi_CaiNiao_Season2

Natural Language Processing (NLP)

One2018 DC philosophy - text intelligent processing challenges

The contest requires participants to analyze the internal structure and semantic information of text based on a batch of long text data and classification information provided by Daguan data, combined with the most advanced NLP and artificial intelligence technology, and construct a text classification model to achieve accurate classification. The data provided by the competition consist of 2 CSV files, training data set and test data set.

TwoDesign of similarity algorithm for Intelligent customer Service problem

The contest requires participants to develop an algorithm to improve the recognition ability and service quality of intelligent customer service based on the real data of intelligent customer service chat robots provided by patting loans, taking natural language processing and text mining technology as the main research object.

Three2018JD Dialog Challenge task oriented dialogue system challenge

This competition requires contestants to analyze the real dialogue data (after desensitization) and the given dialogue data between JingDong user and JingDong artificial customer service, and build end-to-end task-driven multi-wheel dialogue system to output the answer to meet the needs of users.

Rank3:Https://github.com/zengbin93/jddc_solution_4th

Four2018CIKM AnalytiCup

This contest focuses on the adaptation of short text matching in language. The source language is English and the target language is Spanish. Competition requires participants to build cross-language short text matching model to improve the ability of intelligent customer service robot.

In addition, Smilexuhc also provides you with two empirical articles. If you are interested in them, you can collect and learn from your predecessors.

Experience article

Via:Https://github.com/Smilexuhc/Data-Competition-TopSolution

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments