Segmentación de mercado: Machine Learning en marketing en contextos de covid-19

Chambi Condori, Pedro Pablo; Chambi Condori, Pedro Pablo

doi:10.15381/idata.v26i1.23623

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO

Links relacionados

Similares em SciELO

Mais
Mais

Permalink

Industrial Data

versão impressa ISSN 1560-9146versão On-line ISSN 1810-9993

Ind. data vol.26 no.1 Lima jan./jun. 2023

http://dx.doi.org/10.15381/idata.v26i1.23623

Systems and Information Technology

Market segmentation: Machine Learning in Marketing in the Context of COVID-19

Pedro Pablo Chambi Condori¹
http://orcid.org/0000-0002-8628-6825

^¹PhD. Associate professor at the School of Commercial Engineering of Universidad Nacional Jorge Basadre Grohmann de Tacna (Tacna, Peru). E-mail: pchambic@unjbg.edu.pe

ABSTRACT

The COVID-19 health crisis has led to unprecedented changes in consumer behavior, as consumers now purchase differently and use different means. Consumers are checking and judging products via electronic devices, shaping trends in consumer segments. This research study aimed to use the clustering model with Machine Learning resources in the analysis of clusters as a resource for consumer segmentation, a major component in business marketing management. A 6-question questionnaire was administered to 506 people ranging from 18 to 65 years old to gauge their opinions about going shopping. A dataset was organized using the data collected and processed using RapidMiner Studio 9.10 software. The optimal number of clusters and their components were obtained from the performance indicator provided by Machine Learning.

Keywords: market research; segmentation; artificial intelligence; COVID-19

INTRODUCTION

Clustering is a set of non-inferential descriptive statistical techniques-unsupervised in machine learning- aimed at grouping similar objects in the same cluster, by which clusters with high internal affinity and external difference are formed. In business marketing, market segmentation provides insight into the composition of the market, differentiating groups with similar characteristics and needs. Customer segmentation is a fundamental tool for a company, as it allows for the identification of common and differentiated characteristics within the customer database, which are very useful for the strategic management of customers.

^{Casco (2020}) studied the impact of COVID-19 on consumer behavior, finding that consumers are increasingly using technology to purchase goods and services, as well as delivery services for the delivery of products. ^{Mehta et al. (2020}) also studied consumer behavior in the context of COVID-19 and found differences in consumer behavior in ordinary times and in times of crisis; they also identified changing needs due to cultural factors and the age of customers, which have an impact on new patterns of consumer behavior. ^{Eger et al. (2021}) studied the trends and impact of COVID-19 on consumer behavior, documenting behavioral patterns over the course of the second wave of the pandemic in the Czech Republic. They found intergenerational similarities and differences.

The coronavirus outbreak proved to be a devastating human tragedy with negative impacts on the global economy that altered marketing concepts, the very context of marketing tactics, and marketing strategies (^{Kumar et al., 2020}). This unprecedented crisis forced people to isolate themselves, disrupting all aspects of their regular activities (^{Kabir & Madria, 2021}). The pandemic has wreaked havoc, and faced with this situation, many researchers from different fields combined their efforts to provide a wide range of solutions, including advancing digital marketing (^{Rodríguez-Rodríguez et al., 2021}). ^{Yıldırım et al. (2021}) studied the negative impacts of COVID-19 and found that vulnerability, perceived risk and fear can significantly increase preventive behaviors in consumers, providing evidence of dynamic variation in consumer clustering, which are to be studied through market and customer segmentation for strategic customer management.

The impact of 4th generation technologies on market management, quality configuration, product design and development, and communication with customers is becoming increasingly significant. In this regard, ^{Brei (2020}) states that machine learning offers many potential applications in marketing. That is, marketing can be transformed into a more autonomous scientific work through the use of data and the proper formulation of each application. There are two main traditional marketing paradigms: destination marketing and relationship marketing. Each of the many aspects of the two marketing paradigms can be expressed as a machine learning problem, i.e., for each problem, a machine learning model can be built, and model parameters can be set. Artificial intelligence expressions have merited studies because machines with deep learning capabilities can take digital marketing to higher levels and make a difference (^{Miklosik et al., 2019}). Lately, digital marketing is in a state of constant evolution, introducing new tools for data processing that describe the new consumer habits, where the success or failure of digital communication depends largely on the quality of its content marketing (^{Baltes, 2015}).

The widespread impacts of artificial intelligence (AI) and machine learning (ML) on many segments of society has been strongly felt in the field of marketing (^{Jarek & Mazurek, 2019}). After all, machine learning offers a variety of benefits, including the opportunity to use proven techniques for the generalization of the advances achieved by science and scientific research.

According to ^{Ullal et al. (2021}), machine learning is a technique that combines science, statistics and computational coding to detect patterns in large volumes of data. Based on this data support, it is possible to predict future estimates (expected sales, market segment attention quota, and customer management under the CRM concept) resulting from the application of mathematical and statistical algorithms. The integration of big data and machine learning techniques is a result of three key factors: the availability of algorithms and data, the increased processing power of computers and the lower cost of storage of digital devices. The speed of developments in computer science and software engineering, however, poses a challenge of ever-increasing importance: identify and implement the right combination of hardware platforms and software architectures to ensure the continuity of the integrated model and minimize its obsolescence, mitigating a loss of the processing capacity of the data management platform.

The purpose of this study is to conduct a case study on the application of one of the resources of artificial intelligence in market research, specifically, customer segmentation as a tool for the strategic management of business marketing.

The clustering model was used in this study; the rationale of cluster analysis is to group objects into clusters using clustering algorithms. K-means is the most popular of the clustering algorithms, which groups numerical data and each cluster has a center called the mean. It is assumed that the number of clusters k is a fixed value.

According to ^{Mahendiran et al. (2012}), the cited k-means algorithm consists of the squared Euclidean distance divided by two variables, in other words, the sum of squares of the differences of all the coordinates of two points as shown in the following equation:

Where 𝑋 𝑖𝑘 and 𝑋 𝑗𝑘 are the individual values of each of the points, and the Davies-Bouldin index, which calculates the distances between clusters, is used to evaluate performance. In their study Estimating Marketing Component Effects: Double Machine Learning from Targeted Digital Promotions, ^{Ellickson et al. (2022}, p. 2) first aimed to demonstrate how machine learning can be leveraged in a unified framework to provide causal estimates of the effect of separate components in a high-dimensional marketing intervention, using only observational data. Next, the methodology was applied to data from 34 distinct e-mail promotions sent by a retailer to more than 1.3 million people on its contact list to highlight quantitative findings that are relevant to both the marketing literature and management practice. Furthermore, ^{Dzyabura and Hauser (2011}) assert that heuristic decision rules predict validation decisions better than compensatory rules.

^{Jordan and Mitchell (2015}), on the other hand, assert that machine learning assists computer manufacturers in building intelligent equipment that improves through experience. Machine learning has experienced a significant increase in recent years, developing at the intersection of computer science and quantitative methods, and at the core of artificial intelligence and data processing. The accelerated advancement in machine learning has been driven both by the rapid incursion of innovative algorithms, programming languages, increased processor capacity and the expansion of learning theories, and the steady growth in the accessibility of online data, data repositories, and available low-cost computing resources. The adoption of data-intensive machine learning methods is evident throughout the sciences, business management, commerce, finance and economics, health management, education, machine vision surveillance, and engineering, leading to improved decision quality based on data analytics and providing decision-makers with evidence for quality decision-making.

According to ^{Jain and Kain (2018}, p. 131), Machine Learning is the science of getting computers to operate automatically. In the past decade, machine learning has given us self-driving cars, voice recognition apps, effective web search, and an improved understanding of the human genome. Their research paper discussed the method to predict stock market behavior using machine learning algorithms: linear regression, random forest, and multilayer perceptron.

^{Abakouy et al. (2019}, p. 1) conducted a comparative study on the most popular machine learning methods applied to the challenging problem of e-mail marketing personalization. Subject and sender lines have a strong influence on click rates of e-mails, as customers often open and click on emails based on the subject and sender. According to ^{Kaličanin et al. (2019}, p. 472), artificial intelligence (AI) impacts many aspects of life through smart devices and applications, designed to understand consumer behavior, needs, and preferences in order to deliver personalized experiences. AI has been one of the primary drivers of innovation in marketing. Marketers are already leveraging the advantages of AI to gain valuable insights into customers, competitors, and markets. AI automates tasks, reduces costs, and improves workflows. Their paper examines the current and potential applications of AI within marketing by providing a comprehensive overview of existing academic research.

On the marketing intelligent approach, ^{Mahajan et al. (2017}, p. 21929) state that AI is a state-of-the-art marketing management support system for dealing with knowledge using machine learning and other soft computing techniques. The range of potential applications of machine learning techniques in marketing management are consumer behavior, product market structure optimization, marketing mix management, strategic marketing, and finance domain, among others. They describe the synergy between marketing and intelligent systems, especially machine learning techniques.

Regarding the evolution of digital marketing, which was further leveraged in the COVID-19 health crisis context, ^{Bayoude et al. (2018}, p. 373) argued that digital marketing is constantly evolving, new tools are regularly introduced in line with new consumer habits and the multiplication of data, often forcing marketers to delve into too much data that may not even provide them with the overview they need to make business decisions. Following the revolution of machine learning technology in other real-world applications, machine learning is changing the digital marketing landscape. In 2018, marketing organizations are implementing or expanding their use of machine learning. It becomes easier to predict and analyze consumer behavior with great accuracy. In their paper, they proposed the use of state-of-the-art and the most commonly used potential machine learning models in various digital marketing strategies. Also they showed how machine learning tools can be used on a large scale for marketing purposes by analyzing extremely large data sets. The way ML is integrated into digital marketing practices helps them to better understand target consumers and optimize their interactions with them.

The application of artificial intelligence (AI) in marketing aims to continuously track and forecast the next purchasing decisions of target consumers and improve their consumer “journey”. In this regard, ^{MR (2021}) states that the potential of AI is reflected in its core elements: big data, machine learning and powerful solutions. Big data enables marketers to aggregate and segment large amounts of data with minimal manual work. Using such data, they will be confident that they will deliver the right message to the right individuals in the right circumstances, through the channel of their choice. Machine learning (deep learning) enables marketers to understand and draw logical conclusions from large collections of data. They can predict consumer trends, track and quantitatively analyze consumer purchases, forecast consumer behavior, and predict the next consumer behavior. We live in an era where machines truly understand the world in the same way that humans do. Machines can easily recognize concepts and themes in a variety of data, interpret human emotions and communications, and generate appropriate responses to consumers. They can easily predict buyer behavior and decisions and use that data to solve problems in the future. In the coming years, marketers can expect a greater impact of AI, through more intelligent search, smarter ads, refined content delivery, reliance on bots, continuous learning, fraud and data breach prevention, feeling analysis, image and voice recognition, sales forecasting, language recognition, predictive customer service, customer segmentation, and more (^{MR, 2021}). In addition, they may discover a stronger future relationship between marketers and artificial intelligence machines.

Prompted by the COVID-19 health crisis, it is undeniable that the world is rapidly shifting to a digital age. People consume more digital content on a daily basis as they spend more time online. Digital tools and the sites they use play an increasingly significant role in their lives. Smart marketing companies recognize this fact and incorporate the concept of artificial intelligence into their marketing strategies. ^{Thiraviyam (2018}) claims that artificial intelligence is revolutionizing the marketing performance universe today. Big data and the exponential growth of computing power have paved the way for the takeoff of artificial intelligence. As advances continue to be made in machine learning, neural networks and deep learning technology, more companies are turning to artificial intelligence to make their operations smarter and more efficient. The marketing world is increasingly embracing these resources to make sense of data, learn more about customers, and optimize operations. This paper introduces artificial intelligence marketing and further explores how marketers are harnessing the power of artificial intelligence and discusses the need to use this strategy for marketing products and services.

Machine learning offers great potential value for marketing-related applications. However, the proliferation of data types, methods, tools, and programming languages hinders knowledge integration among marketing analytics teams, making collaboration difficult. In this regard, ^{Villarroel and Silipo (2021}, p. 393) state that visual programming might facilitate the orchestration of ML projects in a more intuitive visual fashion. In terms of marketing strategy, ^{Huang and Rust (2022}, p. 218) state that mechanical artificial intelligence can be used for segmentation (segment recognition), thinking artificial intelligence for targeting (segment recommendation), and feeling artificial intelligence for positioning (segment resonance).

The literature review above explains that market and customer segmentation is a fundamental task for the management of business marketing because it relates to the identification of groups of consumers who interact with the products produced by a company. Market research involves gathering consumer perception, often using statistical resources that provide forecasts with a level of probability of success and a degree of statistical significance; however, artificial intelligence resources provide statistical modeling with a degree of certainty in the forecasts.

Machine Learning modeling uses mathematical, econometric, and statistical models. In this case, the k-means model was used, which is a quantitative tool used for clustering that divides the universe n of observations into k different groups guided by criteria of internal affinity and external difference. Figure 1 illustrates the k-means model obtained using Jupyter Notebook and Anaconda 3.0, which seeks to include the nearest neighbor in a specific group. This algorithm falls into the group of unsupervised algorithms. Clustering of the n observations into the k different groups is performed by minimizing the sum of distances in each observation and the centroid of the cluster.

Source: Prepared by the author.

Figure 1 K-Means Clustering.

Depending on the study’s objectives and the analysis of the cases of identification of cluster groups, the segmentation processes with the k-means algorithm aim to minimize the variance within a cluster, referred to as the search for internal homogeneity. On the one hand, it is essential to find homogeneous groups that can be satisfied with a given marketing mix, and, on the other hand, a wide external heterogeneity between the different groups of a population is desired. Therefore, the selection of higher variances between groups of clusters will be favored. It is observed in Figure 1 that there are 5 groups and each of them has components that make up the cluster. Cluster analysis is critical in marketing because its purpose is to identify consumer behavior, find opportunities for new products, choose test markets, and decant data.

Source: Prepared by the author using Anaconda 3.0.

Figure 2 Graphical Representation of Hierarchical Segmentation (Dendrogram).

Figure 2 shows the path of cluster formation according to the distance measure used. The number of clusters depends on where the dendrogram is cut, thus the decision on the optimal number of clusters is subjective. It is advisable to choose an interpretable number of clusters, based on the number of clusters desired, e.g. high-performance clusters, high-availability clusters, and high-efficiency clusters.

This study analyzes the implications of the adoption of artificial intelligence technologies for the location of better market and customer segmentation models, which are particularly important for companies engaged in market and customer management. The quantitative models that describe the behavior of people in the purchasing decision process are multivariate, i.e. a variety of qualitative factors are involved in people’s purchasing decisions. A major factor in the purchasing decisions made by individuals is the set of tastes, preferences, and lifestyles of individuals. The distribution of such manifestations and the advancement of technology have allowed consumers to have access to information almost instantaneously. As a result, small segments of consumer groups with specific consumer profiles have emerged and their requirements prior to making a decision to purchase a given product are considered.

This research reveals the importance of adopting artificial intelligence models (machine learning) in the implementation of segmentation models applied by companies for the strategic management of their customers and markets. It is all the more important in these times of COVID-19 to permanently evaluate the dynamics of changes in consumers’ tastes and preferences as it allows us to assess the efficiency of the application through a performance measurement indicator, namely the Davies-Bouldin index for this study. Clustering models implemented with artificial intelligence tools can be applied in all business and social environments to identify highly differentiated groups that show significant attributes of internal cohesion. Results can be used to implement dynamic, proactive, and innovative policies in organizational management.

From the above, the research hypothesis is as follows: The adoption of artificial intelligence technologies has a positive influence on customer segmentation practices in market research validated by the degree of certainty in segmentation results.

METHODOLOGY

This is an analytical study with a non-experimental design. A 6-question questionnaire with a scale of 7 levels of responses, where only one alternative could be selected for each question, was elaborated. In order to obtain information on the perception of shopping, the questionnaire was addressed to the public that goes to shopping malls to do their shopping. Six (6) attitudinal variables were identified for this purpose. Consumers were asked to express their degree of agreement with the statements described below:

V1: Shopping is fun.
V2: Shopping is not good for my budget.
V3: When I go shopping, I take the opportunity to eat out.
V4: When I go shopping, I look for the best deals.
V5: I don’t feel motivated to go shopping.
V6: I can save a lot of money when I have the chance to shop around.

Each of the questions had seven levels of response, where the lowest level was 1 (totally disagree), and the highest level was 7 (totally agree).

As a matter of convenience, the sample consisted of 506 people of both sexes over 18 years of age; the sample is not random. The questionnaire was administered through Google Forms and was available online for 15 days during March 2022. The Guttman test was selected for the reliability analysis with an index of 0.659 and the validity index with Bartlett’s test of sphericity using the 𝑋 2 indicator with a p-value < 0.05 at 95% statistical significance.

Upon collecting the data using the questionnaire format, the dataset containing the collected data was prepared and organized in an Excel sheet and then exposed to the clustering model with RapidMiner Studio 9.10, following the process presented in Figure 3. The segmentation tests were executed applying the k-means unsupervised learning model. First, the data of 506 records were uploaded, then the data was subjected to the normalization process, then the clustering was activated and, finally, the results were subjected to the performance evaluation.

Source: Prepared by the author.

Figure 3 Cluster Modeling with RapidMiner Studio 9.10

Figure 3 describes the sequential process of clustering application. The database is imported into the Rapid Miner environment; afterward, each of the variables is normalized because the ML modeling requires data that have a Gaussian distribution; next, the cluster classification model is applied to obtain the results for different groups searching for the smallest or largest distance between groups of people; lastly, the results are subjected to the performance evaluation, finding the optimal number of groups that have the best performance.

As presented in Figure 3, the method used for cluster execution with RapidMiner 9.10 consists of the squared Euclidean distance divided by two variables, in other words, the sum of squares of the differences of all the coordinates of two points as shown in the following equation:

Where 𝑋 𝑖𝑘 and 𝑋 𝑗𝑘 are the individual values of each of the points, and the Davies-Bouldin index is used to evaluate performance. The Davies-Bouldin model is a resource used for the internal validation of clustering, which is expressed as follows:

Where k is the number of clusters, 𝜎 𝑖 is the average distance between each point in cluster i and the centroid of the cluster, 𝜎 𝑗 is the average distance between each point in cluster j and the centroid of the cluster, and 𝑑( 𝐶 𝑖 , 𝐶 𝑗 ) is the distance between the cluster centroids.

K-means clustering algorithm is an unsupervised clustering algorithm that has high scalability for processing data. The number of groups to be formed must be specified to use k-means; it is possible to find the optimal number of k clusters by testing, which is the main objective of the research.

RESULTS

Figure 4 depicts the descriptive behavior of each of the variables. Respondents’ responses were mostly positive for all the statements; furthermore, respondents find shopping to be a fun; an opportunity to price-shop; an opportunity to take advantage of bargains; and a chance to find savings in their personal budgets.

Source: Prepared by the author based on data from the questionnaire.

Figure 4 Distribution of Responses to Segmentation Statements.

Table 1 Descriptive Indicators of the Sample.

Indicator	V1	V2	V3	V4	V5	V6
Mean	5.2253	3.1482	2.8320	5.4921	2.7846	5.5850
Median	6.0000	3.0000	3.0000	6.0000	2.0000	6.0000
Maximum	7.0000	7.0000	7.0000	7.0000	7.0000	7.0000
Minimum	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
Std. Dev.	1.4144	1.5142	1.4300	1.4762	1.4702	1.4971
Skewness	−0.8732	0.5118	0.5245	−1.1214	0.6147	−1.0996
Kurtosis	3.4038	2.5153	2.4708	3.8242	2.6413	3.6224
Jarque-Bera	67.7465	27.0447	29.1061	120.3733	34.5784	3.6224
Probability	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
Observations	506	506	506	506	506	506