Next Article in Journal
Electro-Thermal and Aging Lithium-Ion Cell Modelling with Application to Optimal Battery Charging
Next Article in Special Issue
Social Media Rumor Refuter Feature Analysis and Crowd Identification Based on XGBoost and NLP
Previous Article in Journal
Automated Chlorine Dosage in a Simulated Drinking Water Treatment Plant: A Real Case Study
Previous Article in Special Issue
Comparison of Instance Selection and Construction Methods with Various Classifiers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Salespeople Performance Evaluation with Predictive Analytics in B2B

1
DCTI, Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, 1649-026 Lisboa, Portugal
2
INOV INESC Inovação—Instituto de Novas Tecnologias, 1000-029 Lisboa, Portugal
*
Author to whom correspondence should be addressed.
Submission received: 13 April 2020 / Revised: 6 June 2020 / Accepted: 8 June 2020 / Published: 11 June 2020
(This article belongs to the Special Issue Applied Machine Learning)

Abstract

:
Performance Evaluation is a process that occurs multiple times per year on a company. During this process, the manager and the salesperson evaluate how the salesperson performed on numerous Key Performance Indicators (KPIs). To prepare the evaluation meeting, managers have to gather data from Customer Relationship Management System, Financial Systems, Excel files, among others, leading to a very time-consuming process. The result of the Performance Evaluation is a classification followed by actions to improve the performance where it is needed. Nowadays, through predictive analytics technologies, it is possible to make classifications based on data. In this work, the authors applied a Naive Bayes model over a dataset that is composed by sales from 594 salespeople along 3 years from a global freight forwarding company, to classify salespeople into pre-defined categories provided by the business. The classification is done in 3 classes, being: Not Performing, Good, and Outstanding. The classification was achieved based on KPI’s like growth volume and percentage, sales variability along the year, opportunities created, customer base line, target achievement among others. The authors assessed the performance of the model with a confusion matrix and other techniques like True Positives, True Negatives, and F1 score. The results showed an accuracy of 92.50% for the whole model.

1. Introduction

Salesperson performance measurement is a process that occurs multiple times per year on a company. The performance evaluation is based on various Key Performance Indicators (KPI’s) extracted from multiple systems like Customer Relationship Management (CRM), and Enterprise Resource Planning (ERP).
Evaluating these KPI’s can be time-consuming as they require the analysis of figures with complex calculations, a judgment based on the values, and the weight that each of the KPI’s contributes to the performance as a whole. The KPI’s often include the amount of products/services sold by the salesperson, the number of opportunities created, the ability to sell multiple products/services, the variability of the sales along the year, among many others. When a company has dozens or hundreds of salespeople, this process transforms on a thorough process that may involve other departments like Human Resources (HR) and Operations.
The result of the performance evaluation is a classification followed by actions to improve the performance where it is needed. Technology, through Data Mining (DM), currently is capable of make classification based on data. DM is the process of exploration and analysis, by automatic or semiautomatic means, of large quantities of data to discover meaningful patterns and rules [1]. DM tasks are classified into two categories: descriptive and predictive [1]. The predictive tasks, are the ones that perform inferences based on data to make predictions. The goal of these tasks is to create a predictive model. The goal of the predictive model is to allow the data miner to predict an unknown value of a specific variable. When the result of the prediction is a number, it is called a regression, and when the result is a label it’s called a classification [1].
DM classification capabilities can help improving the process of the salesperson performance measurement. Companies can take advantage of the Predictive Analytics (PA) classification capabilities, to help on the judgment of KPI’s that are based on complex calculations and the weight that each KPI contributes to the whole performance evaluation. By using classifications previously made by humans, companies can build models that can classify current sales of a salesperson and use them on the performance evaluation. Through these models it is possible to automate part of the performance evaluation process. The gains these automated evaluations can bring to the companies are among others:
  • Reduction on the number of hours needed to analyse multiple KPI’s to make a judgement of the salesperson performance, allowing the managers and salesperson to focus on other tasks that bring value to the business
  • Improve the Salesperson Performance Appraisal process, by providing in advance, the possible future evaluation of the performance based on the salesperson current sales
  • Allow the salesperson to act sooner in the performance measurement and appraisal time
  • By allowing the salesperson to act sooner, companies can face reductions in salespeople turnover, and consequently reductions of costs on recruiting and training
In this work, the authors propose the use of DM techniques, to allow salesperson and sales leaders to make a better decision about salespeople performance measurement, by building a model in R that can classify a salesperson’s performance based on metrics defined by the business. As many companies can have different evaluation processes, all companies in B2B area that has teams of salespeople being measured based on metrics, can take advantage of this DM process.
The dataset used for this analysis is composed of data regarding salespeople performance measurement from a Freight Forwarding global company. The sales are made by 594 salespeople between January 2017 and June 2019. This measurement is based on the company’s internal performance measurement process, that are explained in this article on a very high level, to provide to the reader an understanding of the data, and the fields necessary to make the performance measurement. It is not the goal of this work to evaluate scientifically the process of salespeople performance measurement of this company.

1.1. Research Contribution

The DM process applied on this work can be replicated to any company who have historical objective metrics, and classifications applied to people based on these metrics.
The contributions of this paper are the followings:
  • Evaluate the use of predictive analytics process to classify salespeople
  • A novel form to use predictive analytics in the salesperson performance evaluation process
  • The use of predictive analytics to reduce the workload needed to prepare the performance evaluation of a salesperson
  • Automation in the analysis of several sales KPI’s (objective measures) to get a classification of the responsible salesperson

1.2. Paper Structure

The paper is structured in the following way: Section 2 has the literature review; Section 3 has the background where the company’s performance evaluation process is explained; Section 4 has the work methodology and all the steps needed to prepare the data for modeling and evaluation; Section 5 has the discussion; Section 6 has the conclusion, and proposals of future work.

2. Literature Review

2.1. Salesperson Performance

Academic studies demonstrate that the success of a salesperson normally has a direct relationship with the company performance, some authors states that: “When salespeople do well, the organization is likely doing well, and the contrary is normally true as well.” [2]. When measuring salesperson performance, there are objective data, such as total sales increase, sales commissions or percent of quota, and subjective measures like manager’s or peer’s assessment of the salesperson [3]. Many companies use a combination of objective and subjective KPI’s to make the assessment. A meta-analysis of objective and subjective sales indicators suggests that there is a low correlation identified between objective and subjective sales success indicators, which show that these indicators are not necessarily interchangeable, and the choice of the most appropriate may require trade-off [2].
The evaluation process of performance varies from company to company [4]. Activities on a job cannot be measured by only one method of objective or subjective measures, as some tasks of a job requires objective method of evaluation, and for others subjective measures are better. Bikrant Kesari examined the impact of objective and subjective measures of evaluation in sales departments, using various methods. For the company being studied, Bikrant Kesari concluded that objective measures were the most relevant factor used in the salesperson evaluation demonstrating the positive impact of the performance [4].
Muhammad Ruhul Amin et al., evaluated the effectiveness of weighted checklist method to appraise the performance of employees on different levels of a bank, based on Self assessment, Competency & demonstration of leadership behaviours, and Skill & knowledge assessment, the achievement classifications were made in 6 levels. The authors of the paper in question concluded that the impact of the method on employees was inevitable and all the financial and non financial benefits were effected due to the method [5].
John P. Campbell et al. defined individual job performance as things people do, and actions people take, that contribute to the organizations goals [6]. In another article Campbell et al. mention that performance is what facilitates achieving the organization goals directly [7].
The performance itself can be measured with judgmental and nonjudgmental measures which are the outcome measures [8]. The outcome measures use objective data, which don’t need abstraction from who is collecting the data [9]. There are three predominant methods of measuring the sales performance. These are Outcome measure that are composed by sales volume and its variants, Judgmental managerial ratings and the salesperson Self-evaluation [10]. In the current work only objective measures are available, as the data provided for the current study only contain volume figures among other information related to sales, but none of these are related to subjective measures.

2.2. Predictive Analytics for Sales

Predictive analytics is an area increasingly entering the business and academic fields [11]. Companies more and more have been using DM to improve their internal processes and automate not only repetitive, but complex tasks nowadays completed by humans [12,13].
Authors in the academic area refer that PA has been used for several years by companies to get a competitive advantage, [14,15]. At first, by companies acting in the B2C with a large customer base and capacity to collect and store transactional data from customers, and only then by companies acting in the B2B area [14].
B2B selling companies are hiring cloud-based PA providers to draw on both inside and outside data sources to identify new leads so that they can take advantage of PA [16].
Mirzaei and Iyer did a comprehensive study on the application of PA over CRM data in 2014. The results show 57 articles found in 4 databases, where the studies focused on dimensions like Customer Acquisition, Attraction, Retention, Development, and Equity Growth [17]. Another fact the results show is that PA techniques between 2003 and 2013 gained a lot of popularity in areas like casinos, retailers, telecommunications, manufacturing, insurance and healthcare [17].
To understand what has been studied in the academic area in terms of predictive analytics, the authors hereunder describes some success cases of PA applied in sales forecasting.

2.2.1. Sales Forecasting of Computer Products Based on Variable Selection Scheme and Support Vector Regression (SVR)

Like many other industries, sales forecasting is also a challenge for computer product retailers. Wrong forecasts can cause product backlog or inventory shortages, incorrect customer demands and decrease customer satisfaction [18].
Chi-Jie Lu et al. combined Multi Variable Adaptive Regression Spines (MARS) with SVR to make a sales forecasting model for computer products. The main idea over the scheme was first to use MARS to select the essential forecasting variables and then use the identified key forecasting variables as the input variables for SVR. The data used was a compilation of the weekly sales data of five computer products from a computer retailer in Taiwan. The sales in the dataset referred to products like Notebooks, LCDs, Main Board, Hardrives, and Display cards [18].

2.2.2. Fast Fashion Sales Forecasting with Limited Data and Time

Another case of success found is applied to fast fashion, which is an industrial practice, where the main idea is to offer a continuous stream of new merchandise to the market [19]. With this practice, some fashion companies are even capable of having the products from the conceptual design to the final product in just two weeks. Companies working with this practice have to make their inventory decisions based on a forecast with short lead time and a tight schedule. The result is companies making a forecast on a near real-time basis and with a minimal amount of data. TM Choi et al. proposed an algorithm called Fast Fashion Forecasting (3F), that give the companies the ability to make forecasts with limited data and time. This algorithm uses two artificial intelligence methods: Extreme Learning Machine (ELM) and the Grey Model (GM). The data used belonged to a knitwear fashion company using a fast-fashion concept. The algorithm was tested with real and artificial sales data, and the results revealed an acceptable forecasting accuracy [19].

2.2.3. Support Vector Regression for Newspaper/Magazine Sales Forecasting

The next case is in the media area, where due to the constant transformations that information technologies are bringing to the world, new generations are more and more used to browse the internet for news and exciting stories [20]. With that in mind, the media industry also has to evolve to keep up with the progress. For that reason, it is more urgent for traditional media companies to make an accurate forecast on printing newspapers and magazines, to avoid excessive printing or not meeting the expected demand [20]. The authors of the study in question used SVR in a media company with printed newspaper/magazines to create a sales forecast that estimate and prepares the prints plan and distribution. The results of the study showed that SVR is a superior method in forecasting sales for the news/magazines industry [20].
With these scientific articles about success cases of PA in the B2C, we move next to success cases in the B2B area.

2.2.4. On Machine Learning towards Predictive Sales Pipeline Analytics

On companies operating in B2B, new sales are often identified as Leads. These leads move then into the Sales Opportunity Pipeline Management System. Later on, some of these Leads are qualified into opportunities. A sales opportunity is a set of one, or several products or services that the salesperson is trying to convert into a purchase. All the Opportunities are tracked, ideally ending on a won business that generates revenue for the company [21].
A fundamental part of the pipeline quality assessment is the lead-level win-propensity score identified as the win-propensity. The salesperson usually enters these scores, but to avoid noise inserted by the salesperson for various reasons and biased scores, the authors of the article in question proposed and successfully deployed a model to calculate the win-propensity using the Hawkes process model in a multinational Fortune 500 B2B-selling company in 2013 [21].

2.2.5. Prescriptive Analytics for Allocating Sales Teams to Opportunities

Still, in the Opportunities, other authors used Predictive and Prescriptive Analytics to increase the revenue of a company by 15%. Such increase was achieved by automating the allocation of sales resources to opportunities, to maximize opportunities revenue in B2B selling for the company [13].
The Predictive part was achieved by mining the historical selling data to learn sales response functions that have the behavioral relationship between the size and composition of a sales team, the revenue earned for the different types of customers, and the opportunities, through multiple linear regression [13].
For Prescriptive, these authors used the sales response functions to determine the allocation of salespeople’s effort to the customer’s opportunities that maximize the overall revenue earned by the salespeople, using a piece-wise linear approximation [13].
As presented in above articles, PA is widely being used on sales, the data used for these predictions is the data type needed to use in measurement of salespeople. With this base on PA for sales, the authors now moves to the application of PA in HR. HR is essential in this work due to the performance evaluation processes.

2.3. Predictive Analytics in HR Management

The articles studied in HR, refers to first how PA is being used for HR in general and then how PA is being used for people performance evaluation and analysis.

2.3.1. How PA Is Being Used for HR in General

An article published in 2017 [22], propose the use of PA in HR for:
  • Employee Profiting and Segmentation, the authors propose that it can be achieved by anticipating the standing of every employee to profit from learning opportunities or capitalize on new undertakings;
  • Employee Attrition and Loyalty Analysis, using predictive risk models to predict potential loss of employee, and by combining attrition risk score with worker performance info, HR can distinguish high-performing employees and also reduce potential attrition;
  • Forecasting of HR Capacity and Recruitment Needs, using PA to anticipate the recruiting needs by combining the gap between people to recruit and people already employed, allowing HR to avoid under and over employment;
The authors of the article in question also proposes research in Appropriate Recruitment Profile Selection, Employee Sentiment Analysis, and Employee Fraud Risk management [22].
Sujeet N. Mishra et al. proposes the use of Human Resource Predictive Analytics (HRPA) for decision making by presenting two cases of success: One in a US wind turbine maker that changed the recruitment and retaining policies based on HRPA; Another is at Cisco, which used IBM SPSS to transform the relationship between its HR analysis and executive leaders [23]. Kessler et al. presents the categorization module of E-Gen, a modular system to treat job listings automatically. Through SVM these authors managed to rank candidate responses based on several information [24]. On another article, two authors used machine learning techniques to rank candidates on a recruiting process by analyzing the candidate adaptability to a job position based on the candidate tweets [25]. Other authors proposed an approach to evaluate job applications in online recruitment systems so they could solve the candidate ranking issue. They achieved this by analyzing the candidate’s Linkedin profile and infer their personality characteristics using linguistic analysis on the candidate blog profile. For that, they had to use training data provided by human recruiters and applied in a large-scale recruitment scenario with three different positions and 100 applicants using Regression Tree and SVR [26].

2.3.2. How PA Is Being Used in HR for Performance Evaluation and Analysis

Zhao in his Conference Proceeding “International Seminar on Future Information Technology and Management Engineering” published in 2008, proposed a method of DM for performance evaluation. For that they gathered information about Ability, Attitude, Performance, Harvest, and Spirit in a dataset. Then they used the K-Expectation algorithm to classify employees into the same group. After that, a Decision tree is used to train a model based on rules that can be used by managers to classify and select the best employees from the applicants [27].
Jing applied Fuzzy Data Mining Algorithm (FDMA) for performance evaluation of human resources. For that, the author used evaluation records with four features: innovation ability, learning level, work efficiency, independence and workability, and each of these had four levels, which are the corresponding score of each feature [28]. Then, Jing used the maximal tree to cluster the human resource leading to the next step, that was to compare the data from management with each cluster and calculate the proximal values based on the FDMA, the last step referred to determine the evaluation. The evaluation, in this case, was a result closer to each of the 4 clusters that are named as Best, Better, General, and Worse [28].
Two authors applied Decision Trees on performance analysis of human resources to make classification analysis. The results show that there are mutual restraint and influence between performance results and working quality, tasks, skills, and attitude. Concluding that if the enterprise in the future cultivates employee working skills and quality, the employees will consciously improve themselves in these areas [29].
The above on PA for HR and performance evaluations are not based on data from sales made by salespeople. What is proposed in this article, is the use of PA to evaluate salesperson using the sales that was made by the salesperson, taking advantage of the data already available in the CRM, ERP systems, and previous performance evaluations. With that ground base, it is now time to proceed into the background section, where the company’s salesperson evaluation process is described.

3. Background

In this section, the process and main KPI’s used to evaluate the salesperson performance is described on a very high level, to provide an understanding of the data and fields used on this research. It is not the goal of this work to evaluate scientifically the process of salespeople performance measurement of this company.

3.1. Main KPI’s Used for Salespeople Performance Evaluation

According to the process of the company that provided the data for this research, the main KPI’s used to evaluate a salesperson performance are:
  • Customer Base which is composed by the customers assigned to a salesperson in the current year
  • Customer Base line that is the sum of volume sold to the Customer Base on the previous year (0 is assumed for new customers)
  • Growth is the difference between the sum of the volume sold in the current year and the Base Line

3.2. Assess Salespeople Performance

Based on the company’s performance evaluation process, there are a number of questions whose answer lead’s to the evaluation level. The answer to these questions are provided by the KPI’s described below:
  • What growth did the salesperson brought to the company?
  • The salesperson achieved the defined Targets?
  • Do the assigned targets to the salesperson follow the company guidelines?
  • Other relevant KPIs, on this stage, we will make a number of queries that goes from an in-depth analysis of the sales fluctuation and Customer Base, to the ratio of opportunities created for each customer
As displayed in Figure 1, the first level to verify is the growth, then check if the targets were achieved and finally if the targets follow the company guidelines. Other relevant KPIs that contribute to salesperson performance is also assessed, but these are the most important ones.

3.2.1. What Growth Did the Salesperson Brought to the Company?

Starting with the first query: “What growth did the salesperson brought to the company?”. A salesperson is assigned to an Account Base that has on average 70 customers, the base for analysis is the growth, which is the difference between the number of Twenty-foot equivalent unit (TEU) sold between the current and previous year. The base in the analysis is the sum of the growth for each year.

3.2.2. The Salesperson Achieved the Defined Targets?

The target definition in this company is supported on a top/down process. Targets are based on a roadmap that is defined globally by the sales controlling department, these targets are assigned for each region, and then distributed by the regional managers to the countries. The process continues until it reaches the salesperson. As exemplified in Figure 2 a global roadmap of 10,000 TEU’s globally was defined. These TEU’s are shared among all the regions, and ends on salesperson x and y in Lisbon with 30 TEU’s each.
Although the company has implemented this process, not always the salesperson gets a reasonable target, because this will depend on the strategy defined by the local sales management, and on this company, part of the strategy is defined locally. For instance, in Figure 2, all Portugal’s targets are assigned to Lisbon and none to Oporto. If the sales management in Portugal believe it’s possible to achieve all targets with the 2 salespeople in Lisbon, they don’t have to assign targets to salespeople in Oporto.
Other than the number of TEU’s assigned for a region/country, there is also a target definition at the product level. This is another way of strategically redirect the sales team to target a specific product. For instance, if a country has a higher market for Import, the sales manager should set Targets on Import to boost Import sales.

3.2.3. Do the Assigned Targets to the Salesperson Follow the Company Guidelines?

In this company, targets are set to a salesperson based on 3 pillars:
  • Account Base
  • Sales roadmap
  • Salesperson seniority
As described previously, the Account Base is composed of the customers that are assigned to the salesperson, and it has a significant impact on the level of the target that can be assigned to the person. If a salesperson has a Customer Base composed by 10 customers and these customers have a possibility of purchase 100 TEU’s along the year, the targets assigned to this salesperson should not be a value that is too far from the 100 TEU’s, unless the person who defines the targets have information’s that may indicate that the customer will have a higher increase.
Sales roadmap is the document that has the plan for the company sales growth for the long term. This document for the company in question is composed of the main product categories, regions, trade lanes, among other information. Often sales managers set targets just based on the sales roadmap, but this may lead to the definition of “unrealistic” targets if the Account Base does not provide the potential needed to achieve the targets. When this happens, CRM Pipeline figures is another ally to set the targets. Usually, to improve target setting, Pipeline figures are added to the Sales Planning process. This way, the salesperson and manager have not only the Customer Base line but also the forecast (assuming good forecasting accuracy).
The salesperson seniority also has a significant role in how the salesperson works the Customer Base. A junior salesperson may not have the ability to manage complex accounts. Therefore the sales manager, when assigning the Customer Base, has to know the salesperson seniority. Seniority in the company/products has also consequences on managing the Account Base. For instance, if somebody has just joined the company and is also junior (young), he/she will need “more” time to start generating results: new company, new products, the need to build an internal network, among other relevant tasks. To mitigate this issue, often sales managers give a new/junior salesperson lower targets in the beginning and then increase the targets year-by-year as the seniority increases.
The Figure 3 displays an example of target definition for one salesperson (the name was replaced by one randomly generated for data protection), where it’s possible to verify a 15% increase from the Account Base line (Identified as the Full Year Actual Adjusted) that is 283 TEU’s, the increase has an impact of 42 more TEU’s, and is splitted across 4 quarters by 10 for Q1, 10 for Q2, 11 for Q3, and 11 for Q4.
Pipeline and seniority are entirely missing in this research, so to judge the targets, a validation is made comparing the targets directly with the Customer Base line in the dataset.
In this dataset, the evaluation is made by dividing the targets with the Customer Base line as displayed in the Formula (1).
T a r g e t e v a l u a t i o n = T a r g e t A c c o u n t B a s e l i n e

3.2.4. Other Relevant KPIs for the Salesperson Performance

There are other KPIs that need to be validated over the salesperson to measure the performance, these include:
  • Customer Base, is the number of customers assigned to the salesperson
  • Customer Base line, TEU’s sold in the previous year to all the customers from the Customer Base
  • Number of Opportunities created by the salesperson
  • Average number of different Opportunities per customer
  • Growth variability along the year (Number of months with positive growth)
  • The salesperson ability to bring growth with Different Products
  • The number of products with positive growth along the year
The table available in the Figure 4 provides all this information’s for a sample of 5 salespeople. Worth of highlighting in the table is the number of opportunities of the first salesperson, which is remarkably high when compared to the second salesperson. Another important information is the average number of months with growth above 0, on average Bella Connor (Belle) is able to grow the Customer Base for about 8 months each year, and she can also grow more than one product.
These rules generate a dataset of 42 KPI’s, where based on the accumulated performance of the salesperson on each of the measures, a classification is possible to define for the salesperson. The classifications are divided into the following categories: Not Performing, Good, and Outstanding.

4. Work Methodology

The work methodology used in this research was the Cross Industry Standard Process for Data Mining (CRISP-DM). This methodology as presented in Figure 5 is divided into 6 stages. In this article, the authors describes the steps executed from stage 1 to 5, the last stage is not described here as requested by the company to not provide any information on that area.
The authors hereunder describes each of the CRISP-DM steps taken during this research following the CRISP-DM methodology.

4.1. Business Understanding

4.1.1. Objectives

With the main goal of classifying salespeople, and build a model that can tell if a salesperson is successful or not, this research project has the following business objectives:
  • Identify the factors that contribute to the success of salespeople, based on the provided data
  • Use predictive analytics process, to classify salespeople into 3 classes specified by the business, namely (Not Performing, Good, and Outstanding)

4.1.2. Business Success Criteria

The main success criteria for this research project is the ability to achieve the specific goals defined previously on the objectives. To evaluate these goals, the authors used the metrics provided by algorithms that measure the accuracy of the classifications.

4.2. Data Understanding

The data used in this research refers to sales between January 2017 and June 2019, from a freight forwarding company that operates worldwide on Air, Ocean, and Land. The sales were made by 594 salespeople. The data refers to shipments and sales opportunities for the customers grouped by year. As this company don’t want to have their sensitive data provided to public, all sensitive data were removed from the dataset. Remaining only the figures and classification. The names of the salespeople were all replaced with names generated on a Name generator website [31].
There are 1071 rows and 45 columns. Each row represents all the sales, customer base, and sales opportunities made by one salesperson to all he/she’s customer base along one year. The dataset has the following structure:
  • Data reflects two and half years, respectively 2017, 2018, and part of 2019
  • Data is grouped by, salesperson and year
  • The volumes, base line, growth, target, and achievement are provided in separate columns for each of the six main products sold by the company, and one extra field for the remaining products growth percent grouped together
  • Monthly variability is part of the dataset provided as number of months with positive growth
  • The included Opportunity information refers only to win & implemented opportunities
  • Target and achievements are included in volume and percentages
  • The previous classifications used to train the model are defined into 3 classes (Not Performing, Good, and Outstanding)

Data Description

The dataset is publicly provided in the university online database. The data is provided on a csv file and the below tables (Table 1 and Table 2) has the description of the attributes.
For each of the six main products, the following fields with performance indicators are also part of the dataset:
T a r g e t A c h i e v e m e n t = T a r g e t G r o w t h
A sample of the dataset is provided on this work in the Figure 6 for better understanding.
The classifications on the dataset, are made in the categories: Not Performing, Good, and Outstanding, these categories represents the following:
  • Not Performing: as someone who has no growth, low or no Opportunities created, low target achievement and low growth over the months on one year
  • Good: as someone who was able to grow the base line on at least 2 products, have positive growth for at least 7 months, have some opportunities, and a good target achievement
  • Outstanding: as someone who was able to grow the base line on more than 2 products, or had an extremely high growth on one product, and have a positive growth along 8 months or more, have a good or high target achievement based on a large base line and high targets

4.3. Data Preparation

The dataset is composed of 45 columns and 1071 Rows. From the 45 columns, four have categorical data: these are Sales_Person_Code, Sales_Person_Name, Year, and Talent. The remaining columns have numerical data containing the salesperson’s performance. A summary of the data available in the dataset is provided in the Table 3 for reference. The columns Sales_Person_Code, Sales_Person_Name, and Year were removed from the dataset, leaving the dataset with 42 columns.
In the next sections, the authors submits the dataset to several techniques that evaluates the importance that each column may have to the model, and eliminates all the ones that contributes little or none. All the evaluations were made using RStudio, all the packages and functions used are identified.
The dataset contains:
  • 695 rows classified by the business in a column called Talent
  • 376 rows without classification, where the Talent column contains no data
In order to train the model, below evaluations and transformation were applied to the 695 classified rows.
The scripts used for this research are made in R language, using the free version of R Studio obtained from: [32] These scripts are provided in the university public database.

4.3.1. Near Zero Variance

Columns with low variance on the data, provide little or no knowledge to the models, so to improve the performance of the model, these columns can be eliminated. To Identify the columns that provide low knowledge, the authors used the function nearZeroVar from the carret package from R. This function diagnoses the predictors that have one unique value, or predictors that have few unique values in relative to the number of samples and the ratio of the frequency, from the most common value to the frequency of the second most common value.
From the results provided by the function, the most importants are zeroVar that has TRUE when the column contains only one distinct value and nzv, which has TRUE when the column in question has a near-zero variance predictor, for reference, the results are provided in the Table 4.
There are 19 columns identified by the nearZeroVar function to be removed. After the removal of the 19 columns, the dataset still has 24 columns, 23 numerical + the Talent column.

4.3.2. Correlation Matrix

After the removal of the columns with low variance, a correlation matrix was applied to the remaining columns (excluding the Talent column), to find the ones that are highly correlated and remove at least one of them. For that, the authors used the function cor from the caret package. The cor function computes the variance, and the covariance of x and y. The results are a percentage of correlation between columns.
The result of the correlation matrix, as presented in the Figure 7, shows that there are 6 columns highly correlated (above 0.8). The authors eliminated three of the six columns, specifically: (Grow_with_Different_Products, Ocean_FCL_Export_Target_Achievement, and Ocean_FCL_Export_Growth_Percent). The dataset has now 21 columns, 20 numeric + the Talent. Only the columns with information specific to a product were removed, because between the columns referring to one product only and the overall, the overall provided more information to the dataset.

4.3.3. Outliers Treatment

After removing the columns that contribute less, and the columns that are highly correlated, an outlier analysis to the remaining columns of the dataset was processed to identify them. Currently, there are 21 columns in the dataset, including the Talent column, which is the column with the classification.
The dataset has a high number of outliers, as it’s possible to verify in the Figure 8. To identify the outliers, the authors used the boxplot.stats function of the package grDevices. This function is typically called by another function to build the boxplot. With that, it was possible to identify the outliers for all the 20 numeric columns.
To not remove data from the small dataset (695 rows from the training dataset), the outlier treatment was focused on applying to every outlier, the values in the range limit, obtained also using the boxplot.stats function from the package grDevices. The lower and higher values applied are provided in the Table 5 for reference, limits were applied to all columns except column: Nº_Months_with_growth_above_0 witch didn’t needed.
After all the evaluations made, the authors discussed with the business the added value of the columns that refers to specific products, like Ocean FCL Export and Ocean FCL Import (the value added of Freight Management was practically removed by the fact that the outlier treatment eliminated all the values). The fact that these 2 products would be the only ones in the model would bias the salespeople that succeed more on these 2 products over the remaining products. Although the Overall Growth is still part of the dataset, the removal of all the columns specific for the products would produce similar results and with more value to the business. This lead to the removal of the other 10 columns. After the removal of these 10 columns, the dataset got reduced to 11 columns 10 numeric + 1 categorical.

4.3.4. Normalize Data

After the completion of all the data treatment steps, and as the Naive Bayes (NB) from R requires all the numeric columns to be standardized. The authors Standardized all the numeric columns using the function normalize of R from the BBmisc package.
With this task completed, the data treatment phase is concluded. The next phase is the evaluation where the results are assessed. This is described in the discussion section.

5. Discussion

5.1. Naive Bayes

In the research, from the studied algorithms, the authors selected the NB because of ease of it’s implementation. The NB algorithm is a probabilistic classifier that selects each independent variable, and then associates it to a conditional probability. The conditional probability is calculated based on the following Formula (3)
P ( C | A ) = P ( A | C ) * P ( C ) P ( A )
The algorithm calculates the probability of an event occurs, based on another event that occurred in the past. For example, to predict if a salesperson may achieve his targets. In the formula, we can associate C to the probability of a salesperson achieving his targets, while A corresponds to the conditions that allowed the salesperson to achieve the targets, for instance, a customer base composed by customers that buy high volumes of TEU’s.
The data was split into 2 separate datasets using the sample function in R, the training dataset with 70% of the data, which corresponds to 481 observations and the test dataset with 214 observations.

5.2. Identify Most Important Factors for Salesperson Success

To achieve the goal: Identify the most important factors for salesperson success, the authors built a Random Forest model with the same train dataset prepared for the NB model, but with the randomForest of R so that the function varImp could be used. The Random Forest model was created using the defaults of R, adding the following parameters: Type of random forest: classification, number of trees: 500, and No. of variables tried at each split: 2. The results were: Out of Bag (OOB) estimate of error rate: 2.91%, and the confusion matrix as provided in the Table 6.
The results of the varImp function are provided in the Table 7.
The results show that the most important features are:
  • Growth_All_Products
  • Target_Achievement_All_Product
  • Growth_Percent_All_Products
  • Nº_Months_with_growth_above_0
The remaining columns have residual importance compared to the ones before mentioned. The results go in line with the business people’s opinions. The salesperson to succeed, have to: focus on growing the customer base, work to achieve their targets, and have steady positive growth for as many months as possible.

5.3. Run the Classification

The authors created a 20 Fold Cross Validation NB model based on the trainControl function from the carret package. Based on this model, the testing dataset was loaded and the predictions were requested.
A confusion matrix was built to evaluate the performance of the predictions made over the test dataset. The results are displayed in the Table 8.
The Accuracy (average) of the model is 92.52%. Based on the Confusion Matrix provided in the Table 8 it’s possible to verify that the model only failed in 7.5% of the cases.
An evaluation of the Precision, Specificity, Sensitivity, and an F1 score was made to evaluate the model accuracy and the results. As it’s possible to verify in the Table 9, the Outstanding has a high Specificity but has a lower Sensitivity.
The F1 score display that the precision of the Not Performing is the highest, but for the Outstanding and Good classes, the accuracy of the tests made are high, which is very important considering that the results of this model are to evaluate people performance. Judging by the dataset size used on this analysis (695 observations), and analyzing it by the classes available, the Good has 269, Not Performing 373, and Outstanding 53. The scores obtained in the Detection Rates reflects the high number of correctly predicted evaluation for each class, and when compared to the Detection Prevalence it confirms the small number of erroneous predictions.
The limitation of this work was the data size and availability, as the number of observations available is not high and the number of observations between the available classes can differ. The authors believe that with a larger dataset, where it would be possible to extract data for each class with a similar number of observations, the model accuracy could be improved, and erroneous cases would decrease, leading to a more accurate model.
As the example, in the Figure 9, Figure 10 and Figure 11, it’s possible to review the results of the assessment in Power BI on a dashboard created for salesperson assessment, the dashboard has all the metrics and a classification made by the Predictive Analytics as Not Performing, Good and Outstanding, with this, all the objectives of the research are concluded successfully.
The steps presented above conclude the evaluation of the model performance. This was the last task in the research. In the next chapters, the authors concludes the research with a summary of the work and suggestions for future work.

6. Conclusions

In this work, the authors applied a Naive Bayes model to classify salespeople into pre-defined categories provided by the business. The classification is done in 3 classes, being: Not Performing, Good and Outstanding. The classification was achieved based on KPI’s like growth volume and percentage, sales variability along the year, opportunities created, customer base line, target achievement among others.
The dataset is composed by 594 salespeople classified into three categories being these:
  • Not Performing: as someone who has no growth, low or no Opportunities created, low target achievement and low growth over the months on one year
  • Good: as someone who was able to grow the base line on at least 2 products, have positive growth for at least 7 months, have some opportunities, and a good target achievement
  • Outstanding: as someone who was able to grow the base line on more than 2 products, or had an extremely high growth on one product, and have a positive growth along 8 months or more, have a good or high target achievement based on a large base line and high targets
The dataset used had in the beginning 45 columns. It was then reduced to 11 columns, based on several techniques to clean the data and evaluate the relevance of the columns to classify a salesperson’s success. In this process, the authors also identified the most critical factors to evaluate a salesperson’s performance based on the data, as Growth amount on all the products, Target achievement on all the products, Growth percentage on all the products, and the Number of Months with Growth above 0.
The model was evaluated with a confusion matrix and other techniques like True Positives, True Negatives, and F1 score. The results showed an Accuracy (average) of 92.52% for the whole model. For each of the classes in terms of precision, Not Performing has 90%, Good 87%, and Outstanding 100%. The F1 scores for Not Performing were 94%, for good 86%, and Outstanding 80%.
The accuracy results in this work are high because the size of the dataset and the variations of data have similar behavior for each of the classes. For instance, a salesperson not performing has in most of the time, low growth, low number of opportunities, and sales above 0 for a small number of months in one year; a good salesperson may have high growth in at least six months over one product; the outstanding salesperson should have growth extremely high for at least one product and growth above 0 for at least eight months.
This approach, when data is available, can help produce new guidelines that HR with pre-defined rules can use to automate part of the performance appraisal process. It can be applied to other cases and companies, and with DM, start automating the analysis of complex KPI’s with relationships between them to generate a classification.

Future Work

As for future work, the authors proposes the use of a NB model to evaluate salespeople’s performance with more CRM information. By taking advantage of other information that is also part of the salesperson job, information like the number Leads, activities (Visits, Calls), the other opportunity states, opportunities conversion rate, and the costs involved for each of the salespeople. The inclusion of subjective factors can also be part of the salesperson’s performance. For instance, a more experienced salesperson may be training a junior salesperson, or taking several lost customers to recover, these facts can have an impact on the sales performance of the salesperson, the inclusion of flags that rate these can also be included.
All to aim towards a detailed and precise evaluation of salespeople’s performance, increasing the fairness and reduce drastically the amount of work needed to make a performance evaluation for the salesperson.

Author Contributions

N.C. is a Master student that performed all development work. J.F. is a thesis supervisor and organized all work in the computer science subject. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partially supported by Portuguese National funds through FITEC programa Interface, with reference CIT “INOV—INESC Inovação—Financiamento Base”.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
B2CBusiness to Consumer
B2BBusiness to Business
CRMCustomer Relationship Management
MARSMulti Variable Adaptive Regression Spines
SVRSupport Vector Regression
SVMSupport Vector Machines
3FFast Fashion Forecasting
ELMExtreme Learning Machine
GMGrey Model
HRPAHuman Resource Predictive Analytics
USUnited States
FDMAFuzzy Data Mining Algorithm
OOBOut of Bag
CRISP-DMCross Industry Standard Process for Data Mining
DMData Mining
KPIKey Performance Indicator
ERPEnterprise Resource Planning
NBNaive Bayes
HRHuman Resources
TEUTwenty-foot equivalent unit

References

  1. Jain, N.; Srivastava, V. Data mining techniques: A survey paper. Int. J. Res. Eng. Technol. 2013, 2, 2319-1163. [Google Scholar]
  2. Rich, G.A.; Bommer, W.H.; MacKenzie, S.B.; Podsakoff, P.M.; Johnson, J.L. Apples and apples or apples and oranges? A meta-analysis of objective and subjective measures of salesperson performance. J. Pers. Sell. Sales Manag. 1999, 19, 41–52. [Google Scholar]
  3. Reday, P.A.; Marshall, R.; Parasuraman, A. An interdisciplinary approach to assessing the characteristics and sales potential of modern salespeople. Ind. Mark. Manag. 2009, 38, 838–844. [Google Scholar] [CrossRef]
  4. Kesari, B. Salesperson Performance Evaluation: A Systematic Approach to Refining the Sales Force. Int. J. Multidiscip. Manag. Stud. 2014, 4, 49–66. [Google Scholar]
  5. Amin, M.; Hossain, M.; Islam, M. Evaluating the Effectiveness of Weighted Checklist Method as a Tool of Employee Performance Appraisal: Evidence from Prime Bank Limited. Stamford J. Bus. Stud. 2015, 6-II, 32–47. [Google Scholar]
  6. Campbell, J.P.; Wiernik, B.M. The Modeling and Assessment of Work Performance. Annu. Rev. Organ. Psychol. Organ. Behav. 2015, 2, 47–74. [Google Scholar] [CrossRef] [Green Version]
  7. Campbell, J.P.; McCloy, R.A.; Oppler, S.H.; Sager, C.E. A theory of performance. Pers. Sel. Organ. 1993, 3570, 35–70. [Google Scholar]
  8. Levy, M.; Sharma, A. Relationships among measures of retail salesperson performance. J. Acad. Mark. Sci. 1993, 21, 231–238. [Google Scholar] [CrossRef]
  9. Landy, F.J.; Farr, J.L. The Measurement of Work Performance: Methods, Theory, and Applications; Academic Press: Cambridge, MA, USA, 1983. [Google Scholar]
  10. Cannon, J.P.; Spiro, R. The Measurement of Salesperson Performance: Comparing Self-Evaluations with Customer Evaluations. In Enhancing Knowledge Development in Marketing—1991 Summer Educators’ Proceedings; American Marketing Association: Chicago, IL, USA, 1991; pp. 1–10. [Google Scholar]
  11. Waller, M.A.; Fawcett, S.E. Data science, predictive analytics, and big data: A revolution that will transform supply chain design and management. J. Bus. Logist. 2013, 34, 77–84. [Google Scholar] [CrossRef]
  12. Ryu, S.; Siegel, E. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die Book Review; Healthc Inform Research; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013; Volume 19, pp. 63–65. [Google Scholar] [CrossRef] [Green Version]
  13. Kawas, B.; Squillante, M.S.; Subramanian, D.; Varshney, K.R. Prescriptive Analytics for Allocating Sales Teams to Opportunities. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, 7–10 December 2013; pp. 211–218. [Google Scholar] [CrossRef]
  14. Domingos, R.; Van de Merckt, T. Best Practices for Predictive Analytics in B2B Financial Services. In Proceedings of the 2010 Conference on Data Mining for Business Applications, August 2010; pp. 35–48. Available online: https://0-dl-acm-org.brum.beds.ac.uk/doi/10.5555/1893248.1893253 (accessed on 1 February 2020).
  15. Bose, R. Advanced analytics: Opportunities and challenges. Ind. Manag. Data Syst. 2009, 109, 155–172. [Google Scholar] [CrossRef] [Green Version]
  16. Lilien, G.L. The B2B Knowledge Gap. Int. J. Res. Mark. 2016, 33, 543–556. [Google Scholar] [CrossRef]
  17. Mirzaei, T.; Iyer, L. Application of predictive analytics in customer relationship management: A literature review and classification. In Proceedings of the Southern Association for Information Systems Conference, Macon, GA, USA, 21–22 March 2014; pp. 1–7. [Google Scholar]
  18. Lu, C.J. Sales forecasting of computer products based on variable selection scheme and support vector regression. Neurocomputing 2014, 128, 491–499. [Google Scholar] [CrossRef]
  19. Choi, T.M.; Hui, C.L.; Liu, N.; Ng, S.F.; Yu, Y. Fast fashion sales forecasting with limited data and time. Decis. Support Syst. 2014, 59, 84–92. [Google Scholar] [CrossRef]
  20. Yu, X.; Qi, Z.; Zhao, Y. Support Vector Regression for Newspaper/Magazine Sales Forecasting; Procedia Computer Science; Elsevier: Amsterdam, The Netherlandsm, 2013; Volume 17, pp. 1055–1062. [Google Scholar] [CrossRef] [Green Version]
  21. Yan, J.; Zhang, C.; Zha, H.; Gong, M.; Sun, C.; Huang, J.; Chu, S.; Yang, X. On machine learning towards predictive sales pipeline analytics. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
  22. Malisetty, S.; Archana, R.; Kumari, V. Predictive Analytics in HR Management. Indian J. Public Health Res. Dev. 2017, 8, 115. [Google Scholar] [CrossRef]
  23. Mishra, S.N.; Lama, D.R.; Pal, Y. Human Resource Predictive Analytics (HRPA) For HR Management in Organizations. Int. J. Sci. Technol. Res. 2019, 5, 3. [Google Scholar]
  24. Kessler, R.; Torres-Moreno, J.M.; El-Bèze, M. E-Gen: Automatic Job Offer Processing System for Human Resources. In MICAI 2007: Advances in Artificial Intelligence; Gelbukh, A., Kuri Morales, Á.F., Eds.; Springer: Berlin, Germany, 2007; pp. 985–995. [Google Scholar]
  25. Menon, V.M.; Rahulnath, H.A. A novel approach to evaluate and rank candidates in a recruitment process by estimating emotional intelligence through social media data. In Proceedings of the 2016 International Conference on Next Generation Intelligent Systems (ICNGIS), Kottayam, India, 1–3 September 2016; pp. 1–6. [Google Scholar] [CrossRef]
  26. Faliagka, E.; Ramantas, K.; Tsakalidis, A.; Tzimas, G. Application of machine learning algorithms to an online recruitment system. In Proceedings of the International Conference on Internet and Web Applications and Services, Stuttgart, Germany, 27 May–1 June 2012. [Google Scholar]
  27. Zhao, X. A Study of Performance Evaluation of HRM: Based on Data Mining. In Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering, Leicestershire, UK, 20 November 2008; pp. 45–48. [Google Scholar] [CrossRef]
  28. Jing, H. Application of Fuzzy Data Mining Algorithm in Performance Evaluation of Human Resource. In Proceedings of the 2009 International Forum on Computer Science-Technology and Applications, Chongqing, China, 25–27 December 2009; Volume 1, pp. 343–346. [Google Scholar] [CrossRef]
  29. Xiaofan, C.; Fengbin, W. Application of Data Mining on Enterprise Human Resource Performance Management. In Proceedings of the 2010 3rd International Conference on Information Management, Innovation Management and Industrial Engineering, Kunming, China, 26–28 November 2010; Volume 2, pp. 151–153. [Google Scholar] [CrossRef]
  30. Wikipedia. Cross-Industry Standard Process for Data Mining. 2020. Available online: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining (accessed on 12 May 2020).
  31. Masterpiece Generator. Name Generator. 2020. Available online: https://www.name-generator.org.uk/ (accessed on 1 February 2020).
  32. RStudio.com. R Studio. 2019. Available online: https://rstudio.com/ (accessed on 22 October 2019).
Figure 1. Company’s performance evaluation stages.
Figure 1. Company’s performance evaluation stages.
Applsci 10 04036 g001
Figure 2. Target definition Top Down.
Figure 2. Target definition Top Down.
Applsci 10 04036 g002
Figure 3. Example of target setting in Incentive Management System.
Figure 3. Example of target setting in Incentive Management System.
Applsci 10 04036 g003
Figure 4. Other relevant KPIs for 5 salespeople sorted by growth.
Figure 4. Other relevant KPIs for 5 salespeople sorted by growth.
Applsci 10 04036 g004
Figure 5. CRISP-DM Methodology adapted from [30].
Figure 5. CRISP-DM Methodology adapted from [30].
Applsci 10 04036 g005
Figure 6. Sample of report data.
Figure 6. Sample of report data.
Applsci 10 04036 g006
Figure 7. Correlation matrix.
Figure 7. Correlation matrix.
Applsci 10 04036 g007
Figure 8. Outlier display.
Figure 8. Outlier display.
Applsci 10 04036 g008
Figure 9. Dashboard for a salesperson classified as Not Performing.
Figure 9. Dashboard for a salesperson classified as Not Performing.
Applsci 10 04036 g009
Figure 10. Dashboard for a salesperson classified as Good.
Figure 10. Dashboard for a salesperson classified as Good.
Applsci 10 04036 g010
Figure 11. Dashboard for a salesperson classified as Outstanding.
Figure 11. Dashboard for a salesperson classified as Outstanding.
Applsci 10 04036 g011
Table 1. Main attributes description.
Table 1. Main attributes description.
Attribute NameDescription
TalentThe classification applied for the salesperson based on he’s/she’s
performance for one year
Sales Person CodeInternal identification of the salesperson
Sales Person NameName of the salesperson
YearYear the data refers to
Growth All ProductsThe Growth brought by the salesperson on that year, for all the products together
(Growth calculation process is explained in Section 3.1)
Customer Base All ProductsNumber of customers assigned to the salesperson for each year
Base Line All ProductsTEU’s sold for all the Customer Base of the salesperson in the previous year
for all the products
Growth Percent All ProductsThe result of the Growth All Products divided by the Base Line All Products
Target All ProductsSum of all the targets defined for each salesperson in one year, for all the products together
Target Achievement All ProductsTarget achievement for all the products, this is achieved by dividing the Target All Products
by the Growth All Products, using the formula displayed in (2)
Number of Opportunities createdCount of opportunities won by the salesperson for each year
Average Opportunities per CustomerAverage number of won opportunities per customer for each year
Nº Months with growth above 0Number of months with growth above 0, for each year
(for all the products together)
Number of Different Products with GrowthCount of the number of products that the salesperson can grow on one year
Growth with Different ProductsYes/No field identifying if the salesperson is able to
grow more than one product on the year
Table 2. Product attributes description.
Table 2. Product attributes description.
Attribute NameDescription
TargetThe defined Target for the whole year
GrowthThe sum of all the growth for the year
Base lineThe sum of all the previous year TEU’s for all the Customer Base of one year
Growth PercentThe result of the column Growth for the product in question divided by the Base line
Target AchievementCalculation of the Target Achievement for the year, using the formula displayed in (2)
Table 3. Table with classification statistics.
Table 3. Table with classification statistics.
FieldSample ValueMin1st QuartileMedianMean3rd QuartilMax
TalentGoodNot Performing: 373, Good: 269 and Outstanding: 53
Growth_All_Products−46−1790−26.537138.4205.512,617
Customer_Base_All_Products4171415.972283
Base_Line_All_Products78040.5204633.4631.512,443
Growth_Percent_All_Products−0.59−10.170.183.171.08566
Target_All_Products00120272405.75606200
Target_Achievement_All_Products0−293−0.090.11−1.070.6688
Nº_Opportunities_created0021017.8324202
Average_Nº_of_Opportunities_per_customer00111.13116
Nº_Months_with_growth_above_030365.87912
Nº_Different_Products00111.3026
Grow_with_Different_Products00000.3511
Ocean_FCL_Import_Target000101220.903002250
Ocean_FCL_Import_Growth0−21930072.88873423
Ocean_FCL_Import_Base_Line00037303.8023312,199
Ocean_FCL_Import_Growth_Percent0−1001.851110,429
Ocean_FCL_Import_Target_Achievement0−185.50000.030.4188
Ocean_FCL_Export_Target0010100177.502406200
Ocean_FCL_Export_Growth−46−1790−7.5035.07563224
Ocean_FCL_Export_Base_Line780028279.50195.508413
Ocean_FCL_Export_Growth_Percent−0.59−1−0.1802.271517.67
Ocean_FCL_Export_Target_Achievement0−293−0.030−0.990.3327
Reefer_Export_Target00004.9402000
Reefer_Export_Growth0−771002.4801923
Reefer_Export_Base_Line000016.4105585
Reefer_Export_Growth_Percent0−1000.10046.8
Reefer_Export_Target_Achievement0−2000.0209.36
Ocean_FCL_Cross_Trade_Target00001.490229
Ocean_FCL_Cross_Trade_Growth0−320002.410706
Ocean_FCL_Cross_Trade_Base_Line00004.0701270
Ocean_FCL_Cross_Trade_Growth_Percent0−1000.24045.56
Ocean_FCL_Cross_Trade_Target_Achievement0−3.700003.43
Freight_Management_FCL_Target00000.760530
Freight_Management_FCL_Growth0−5950023.99012,391
Freight_Management_FCL_Base_Line000024.0704858
Freight_Management_FCL_Growth_Percent0−1000.24055.50
Freight_Management_FCL_Target_Achievement0000000.51
Reefer_Import_Target00000.11024
Reefer_Import_Growth0−186000.910172
Reefer_Import_Base_Line00003.110501
Reefer_Import_Growth_Percent0−1000.340155
Reefer_Import_Target_Achievement0−0.6500000.58
Remaining_Products_Growth_Percent0−389000.640408
Table 4. Result of the nearZeroVar function.
Table 4. Result of the nearZeroVar function.
ColumnFreqRatioPercentUniqueZeroVarnzv
Talent1.390.43FALSEFALSE
Growth_All_Products1.1466.91FALSEFALSE
Customer_Base_All_Products1.037.77FALSEFALSE
Base_Line_All_Products3.9164.89FALSEFALSE
Growth_Percent_All_Products9.2090.50FALSEFALSE
Target_All_Products1.6438.99FALSEFALSE
Target_Achievement_All_Products14.6789.93FALSEFALSE
Nº_Opportunities_created3.2512.52FALSEFALSE
Average_Nº_of_Opportunities_per_customer3.831.29FALSEFALSE
Nº_Months_with_growth_above_01.001.87FALSEFALSE
Nº_Different_Products1.971.01FALSEFALSE
Grow_with_Different_Products1.870.29FALSEFALSE
Ocean_FCL_Import_Target7.8928.63FALSEFALSE
Ocean_FCL_Import_Growth25.2546.91FALSEFALSE
Ocean_FCL_Import_Base_Line25.2244.60FALSEFALSE
Ocean_FCL_Import_Growth_Percent5.6164.17FALSEFALSE
Ocean_FCL_Import_Target_Achievement80.6763.45FALSEFALSE
Ocean_FCL_Export_Target5.2328.78FALSEFALSE
Ocean_FCL_Export_Growth12.8344.75FALSEFALSE
Ocean_FCL_Export_Base_Line14.0741.29FALSEFALSE
Ocean_FCL_Export_Growth_Percent2.4162.88FALSEFALSE
Ocean_FCL_Export_Target_Achievement48.5065.61FALSEFALSE
Reefer_Export_Target136.401.44FALSETRUE
Reefer_Export_Growth218.334.75FALSETRUE
Reefer_Export_Base_Line221.333.60FALSETRUE
Reefer_Export_Growth_Percent54.582.88FALSETRUE
Reefer_Export_Target_Achievement687.001.29FALSETRUE
Ocean_FCL_Cross_Trade_Target136.001.58FALSETRUE
Ocean_FCL_Cross_Trade_Growth214.005.47FALSETRUE
Ocean_FCL_Cross_Trade_Base_Line163.753.74FALSETRUE
Ocean_FCL_Cross_Trade_Growth_Percent49.384.60FALSETRUE
Ocean_FCL_Cross_Trade_Target_Achievement685.001.58FALSETRUE
Freight_Management_FCL_Target694.000.29FALSETRUE
Freight_Management_FCL_Growth25.3310.94FALSEFALSE
Freight_Management_FCL_Base_Line39.677.48FALSETRUE
Freight_Management_FCL_Growth_Percent7.828.20FALSEFALSE
Freight_Management_FCL_Target_Achievement694.000.29FALSETRUE
Reefer_Import_Target344.500.72FALSETRUE
Reefer_Import_Growth39.004.17FALSETRUE
Reefer_Import_Base_Line64.803.45FALSETRUE
Reefer_Import_Growth_Percent20.134.32FALSETRUE
Reefer_Import_Target_Achievement691.000.72FALSETRUE
Remaining_Products_Growth_Percent42.865.61FALSETRUE
Table 5. Outlier conversion table.
Table 5. Outlier conversion table.
FieldMin ValueMax ValueApplied Lower-ValueApplied Higher Value
Growth_All_Products−179012,617−373552
Customer_Base_All_Products183144
Base_Line_All_Products012,44301468
Growth_Percent_All_Products−1566−12.94
Target_All_Products0620001200
Target_Achievement_All_Products−29388−1.181.76
Nº_Opportunities_created0202057
Average_Nº_of_Opportunities_per_customer01611
Nº_Months_with_growth_above_0012012
Nº_Different_Products0603
Ocean_FCL_Import_Target022500750
Ocean_FCL_Import_Growth−21933423−129215
Ocean_FCL_Import_Base_Line012,1990571
Ocean_FCL_Import_Growth_Percent−1110.43−12.47
Ocean_FCL_Import_Target_Achievement−185.5088−0.601.02
Ocean_FCL_Export_Target062000583
Ocean_FCL_Export_Growth−17903224−102148
Ocean_FCL_Export_Base_Line084130482
Freight_Management_FCL_Growth−59512,39100
Freight_Management_FCL_Growth_Percent−155.5000
Table 6. Confusion matrix from Random Forest.
Table 6. Confusion matrix from Random Forest.
GoodNot PerformingOutstandingClass.error
Good183080.04
Not Performing125000.00
Outstanding70320.18
Table 7. Feature importance.
Table 7. Feature importance.
FeatureOverall
Growth_All_Products132.94
Target_Achievement_All_Products54.46
Growth_Percent_All_Products26.35
Nº_Months_with_growth_above_023.59
Target_All_Products9.61
Base_line_All_Products7.87
Customer_Base_All_Products6.06
Nº_Opportunities_created4.63
Nº_Different_Products4.47
Average_Nº_of_Opportunities_per_customer0.22
Table 8. Cross-Validated (20 fold) Confusion Matrix.
Table 8. Cross-Validated (20 fold) Confusion Matrix.
GoodNot PerformingOutstanding
Good35.802.700.80
Not Performing2.3049.500.0
Outstanding1.700.007.30
Table 9. Evaluation scores for NB model.
Table 9. Evaluation scores for NB model.
GoodNot PerformingOutstanding
Sensitivity85%97%67%
Specificity93%88%100%
Pos Pred Value87%90%100%
Neg Pred Value91%97%97%
Precision87%90%100%
Recall85%97%67%
F186%94%80%
Prevalence37%53%10%
Detection Rate32%51%7%
Detection Prevalence36%57%7%
Balanced Accuracy88%93%83%

Share and Cite

MDPI and ACS Style

Calixto, N.; Ferreira, J. Salespeople Performance Evaluation with Predictive Analytics in B2B. Appl. Sci. 2020, 10, 4036. https://0-doi-org.brum.beds.ac.uk/10.3390/app10114036

AMA Style

Calixto N, Ferreira J. Salespeople Performance Evaluation with Predictive Analytics in B2B. Applied Sciences. 2020; 10(11):4036. https://0-doi-org.brum.beds.ac.uk/10.3390/app10114036

Chicago/Turabian Style

Calixto, Nelito, and João Ferreira. 2020. "Salespeople Performance Evaluation with Predictive Analytics in B2B" Applied Sciences 10, no. 11: 4036. https://0-doi-org.brum.beds.ac.uk/10.3390/app10114036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop