1 Lessons learned of myself:

1. 特征工程比模型调参更重要，重要性大出一个数量级
2. 模型调参时，学习率先从较大的数字开始，节约时间
3. 有关日期的比赛，本地cv日期的选择很重要，和最终测试日期有相似性才比较好
4. 本地的每一次运行，cv，参数，分数都需要很好的记录下来，供后期对比分析
5. 很多大神都开始在回归类题目里面开始用NN了，确实成绩会比xgb lgb等会有较大提升

2 Lessons learned from others:

1. 特征工程特别重要
2. 神经网络要胜过决策树boost了

2.1 Eureka 1st place solution

1. 基本特征，分类（store,item,family,class,cluseter）,打折否，day_of_week(only for model 3);
2. 统计特征，时间窗口，最近日期:[1,3,5,7,14,30,60,140]；等时间窗口[1] * 16, [7] * 20; 关键特征：store x item， item， store x class, target： promotion, unit_sales, zeros, 方法：mean,median,max,min,std, day since last appearance. difference of mean value between adjacent time windows(only for equal time windows)
3. 无用的特征，节假日， 其他的keys例如，cluster x item, store x family

1. model_1 : 0.506 / 0.511 , 16 lgb models trained for each day code
2. model_2 : 0.507 / 0.513 , 16 nn models trained for each day code
3. model_3 : 0.512 / 0.515，1 lgb model for 16 days with almost same features as model_1
4. 0.517 / 0.519，1 nn model based on @sjv's code

Stacking doesn't work well this time, our best model is linear blend of 4 single models.
final submission = 0.42*model_1 + 0.28 * model_2 + 0.18 * model_3 + 0.12 * model_4
public = 0.504 , private = 0.509

2.2 Luck Yu 2nd place solution overview

Feeding data with mini-batch with randomly sampled 128 sequence. Then randomly to choose the start of decode/target date. So, we could say somehow the model will see different data for each training iteration. because the total dataset is around 170000 (seq) x 365 days。这么训练下来，wavenet应该能够处理好过拟合

2.3 slonoslon 3rd place solution overview

lgbm + nn. 遇到时间序列问题的时候，两个最重要的事情是validation和bagging。

about bagging, our final result includes models, trained on 10 runs each. 直接平均。

2.4 sjv 4th-Place Solution Overview:

input representation：原始的时间序列的值，分类变量，手动做的特征（lags,diffs,rolling statistics, date features, conditioning time series, average sales for a given product/store/etc.）

validation: 用的是过去365天随机的，这么做的目的是为了防止过拟合于某一个validation set/weekly/monthly trends.

2.5 Lingzhi 5th Place Solution

lgbm，公开kernel的的升级版本，加了更多的特征，数据，日期。

cnn+dnn, This is a traditional NN model, where the CNN part is a dilated causal convolution inspired by WaveNet, and the DNN part is 2 FC layers connected to raw sales sequences. Then the inputs are concatenated together with categorical embeddings and future promotions, and directly output to 16 future days of predictions.

RNN, This is a seq2seq model with a similar architecture of @Arthur Suilin's solution for the web traffic prediction. Encoder and decoder are both GRUs. The hidden states of the encoder are passed to the decoder through an FC layer connector. This is useful to improve the accuracy significantly.

2.6 Nicolas 6th Place Solution Overview

• ma_median * isd_avg / isd_week_avg - popularized early on in the competition.
• ma_median - Coupled with binary for whether or not it is equal to 0.
• Day-of-week averages - Different time periods including 7, 14, 28, 56, 112.
• Days since appeared - Difference between the 'start date' of the training cycle and the first date the item showed up in the original train file.
• Quantiles for several different time-spans.
• Whether the item will be onpromotion.
• Simple averages for different time-spans.
• Item-cluster means - The past 5-day mean was a good predictor.
• Future promotional 'sums' - E.g: sum of onpromotion 8/16 through 8/18.
• mean_no_zero_sales - Mean of instances where unit_sales > 0.
• Frequency encoding - calculate the frequency of appearance for items, stores, families and classes (four columns that each sum to 1).

• One-hot-encoding clusters, states, types, cities and families.
• Weekend and weekday means.
• Staggered mean data (e.g. 8/07 --> 8/14 for the test-set).
• Staggered quantile data.
• Past number of zero sales for different time-spans.
• Past number of promotional days.
• Past promotional sales averages.
• Past day-of-week quantile data.

2.7 CPMP 8th solution

single NN got 508 on public and 515 on private. single lgb got 514 on private.

seasonality，时间序列处理中，两个关键点是时期性和cv设定。这个题目中weekly时期影响很强而yearly很小

cv setting,采用了两段时间2017-7-15，2017-7-31 和 2017-8-1，2017-8-15

Giba大佬评论说了：I always say and continue saying, don't trust public LB. Specially in a time series challenge where public LB is calculated using 5 days ahead and private is from 6 to 16 days ahead, they are completely different models.