• Home
  • Chat IA
  • Guru IA
  • Tutores
  • Central de ajuda
Home
Chat IA
Guru IA
Tutores

·

Cursos Gerais ·

Introdução à Lógica e Programação

Envie sua pergunta para a IA e receba a resposta na hora

Recomendado para você

Estruturas de Seleção em Linguagem C

12

Estruturas de Seleção em Linguagem C

Introdução à Lógica e Programação

UMG

Trabalho de Programação Linear

1

Trabalho de Programação Linear

Introdução à Lógica e Programação

UMG

Relato de Experiencia - Tipos de Testes em Aplicativos e Sistemas com Android Studio

4

Relato de Experiencia - Tipos de Testes em Aplicativos e Sistemas com Android Studio

Introdução à Lógica e Programação

UMG

Roteiro de Aula Pratica 2 - Simulacao de Redes com Cisco Packet Tracer

15

Roteiro de Aula Pratica 2 - Simulacao de Redes com Cisco Packet Tracer

Introdução à Lógica e Programação

UMG

Modelagem de Dados - Roteiro Aula Pratica 2 - Criacao de DER com Workbench MySQL

4

Modelagem de Dados - Roteiro Aula Pratica 2 - Criacao de DER com Workbench MySQL

Introdução à Lógica e Programação

UMG

Teste de Software: Qualidade e Normas

57

Teste de Software: Qualidade e Normas

Introdução à Lógica e Programação

UMG

Exercicios de Lógica

59

Exercicios de Lógica

Introdução à Lógica e Programação

UMG

Roteiro de Aula Prática: Programação e Desenvolvimento de Banco de Dados

4

Roteiro de Aula Prática: Programação e Desenvolvimento de Banco de Dados

Introdução à Lógica e Programação

UMG

Lista de Exercícios de Programação - Cálculos e Conversões

3

Lista de Exercícios de Programação - Cálculos e Conversões

Introdução à Lógica e Programação

UMG

Clinica Veterinaria

2

Clinica Veterinaria

Introdução à Lógica e Programação

UMG

Texto de pré-visualização

A DATA SCIENCE APPROACH IN A SOCIOECONOMIC ANALYSIS OF PRICES FOR TRANSPORT TRAVEL BY UBER APP G 1 B 2 1 Federal University ZZ Brazil Email g 2 Federal University Email b Studies that use data from the transport company Uber showed that there are factors that contribute to the increase in prices of its travel services In this context this research aims to analyze travel routes for lowincome users and contribute to reducing these prices For this we sought to answer If a financial center were closer to economically poorer neighborhoods would there be a change in the average prices of these trips Could this change financially improve the lives of lowincome people The purpose of our research to answer these questions was to investigate this factor of financial concentration in territorial regions analyzing prices and socioeconomic data in the South American city of Fortaleza located in the country Brazil and from the North American city of Boston located in the United States of America As results obtained it was observed that in a more decentralized scenario of a financial center lowincome users of Uber in Fortaleza could have their trip prices reduced by about 4307 This reduction would represent a monthly savings of around 1882 of their Average Personal Income For users living in wealthy highincome neighborhoods this decentralization would increase travel costs to just over 100 However this increase would represent 671 of their Average Personal Income Keywords Transport by Application Socioeconomic Data Data Science Exploratory Data Analysis INTRODUCTION According to Quick 2020 transport by apps also known as a taxi by app and paid ride are digital passenger transport services transport of meals and delivery of various items The participation of people in this market to acquire extra income or even find a more profitable job is a worldwide phenomenon Some companies in this niche are Uber InDriver Lyft 99 Cabify Rappi and iFood In this context some factors can influence the supply and demand of transport by applications such as urban and demographic characteristics income competition availability of other means of transport and tourist flow peculiar to each city within others QUICK 2020 In this perspective recent research has sought to find relationships between the socioeconomic characteristics of a region and some aspects of the Uber app transport company SILVA 2020 These aspects are accessibility waiting time neighborhood quality of life indicators livability indicators for cities and neighborhoods and urban planning However according to Silva 2020 these studies have not explored the price dimension in the relationship between the service offered by Uber and the socioeconomic characteristics of the places of embarkation andor disembarkation Through a dataoriented methodology Silva 2020 observed that time and distance are related to the pricing process of Ubers travel service to the city of Natal located in Brazil and that this would enable an improvement in travel strategies of supply and demand for that service However this process may present other factors that may contribute to the design of these prices In this perspective this work analyzed prices considering socioeconomic data from the cities of Fortaleza in Brazil and Boston in the United States of America in order to show that users who live in financially poorer neighborhoods and use Ubers travel service end up paying more expensive than residents of wealthier neighborhoods when the destination is the commercial center Thus it would be possible to propose a decentralization of these centers to soften the expenses of the income of the poorest users without affecting in a very significant way the users of wealthier regions As for both cities there was convergence in the behavior of high prices in trips to shopping centers starting from poorer neighborhoods in this research also proposed a new functionality for Uber to give more freedom to the user of the service This is the choice of travel utilizing a price bid in which the application returns the best distances based on the offered price Distance prediction models were created to achieve this functionality using regressor algorithms and validation tests So far for the city of Fortaleza no public data of travel companies by application was found Due to current legislation such as the General Data Protection Law GDPL the disclosure of information in this context is restricted Because of this we chose to simulate a database considering Ubers price simulation platform and the research carried out on peak hours demand and service offer per week traffic and prices for that city The relationships found served as a basis for evidence of the high prices offered to travelers residing in financially poorer neighborhoods and those who were destined for the citys commercial center With the socioeconomic data obtained for the city of Boston and the results of the analyses it was possible to evidence the exact behavior of price increases for destinations that concentrate on financial centers starting from poorer neighborhoods We chose the city of Fortaleza because of the ease of finding socioeconomic data at the neighborhood level and because it belongs to the Northeast region of Brazil with socioeconomic characteristics similar to other capitals in this region heavy traffic predominantly hot climate and income inequality SILVEIRA 2020 In the same way the city of Boston was chosen as it was the city that found a more complete and coherent database at the neighborhood level although the socioeconomic data are not as easily accessible as those found for Fortaleza This research aims to analyze the travel routes of lowincome users and contribute to reducing the prices of these trips in transport by the Uber app For this we seek to answer If a financial center were closer to economically poorer neighborhoods would there be a change in average prices Could this change financially improve the lives of lowincome people The rest of this paper is structured as follows Section 2 deals with information about transport by applications emphasizing the company Uber and its importance in the market Section 3 shows the strategy for applying the contents collected in Background and Related Works to carry out the analysis and the procedure for representing the observations Section 4 deals with analyzing travel data for both cities In this section we analyze the socioeconomic data of these incomes at the neighborhood level to highlight the possible relationships between travel prices and regions of high and low commercial concentration Sections 5 we summarize the results obtained from previous analyses and discuss the possible contributions of this research BACKGROUND AND RELATED WORKS Subsection 21 explains the socioeconomic data of Fortaleza and subsection 22 concerns about related works Socioeconomic Data of Fortaleza This subsection deals with obtaining socioeconomic data for the cities of Fortaleza This research focused on the income dimension for the two cities to make it possible to obtain relationships between travel prices and per capita income homogenizing a financial perspective for this study The city of Fortaleza the capital of the state of Ceará is located on the shores of the Atlantic Ocean and located in the Northeast region of Brazil in a tropical climate zone marked by high humidity The local economy is relatively diversified with significant secondary and tertiary activities Its territory is highly sought after by tourists due to the presence of beautiful beaches and the rich local culture GOVERNO DO CEARÁ 2022 Data from the Brazilian Institute of Geography and Statistics BIGS in 2010 revealed that in the income item one of the three main items of the Human Development Index HDI the average value per capita in Brazilian currency in Fortaleza has increased by 8518 in the last two decades from R45704 in 1991 to R61048 in 2000 and R84636 in 2010 This index called HDI R e represents the average monthly income per capita by neighborhood Extreme poverty measured as the proportion of people with a per capita household income of less than R7000 went from 1525 in 1991 to 902 in 2000 to 336 in 2010 According to Silveira 2020 the HDI R is considered an indicator of the average potential of residents of a neighborhood to obtain goods and services It is used as an indicator of peoples ability to secure a standard of living that can meet their basic needs Table 1 shows five neighborhoods of each income class rich poor and middle for the city of Fortaleza in terms of HDI R in 2010 The values refer to people aged ten years and over Table 1 HDI R of some neighborhoods of different income classes in Fortaleza Rich Neighborhoods HDI R PERSONAL AVERAGE INCOME R AVERAGE PERSONAL AVERAGE INCOME R Meireles 0953 365954 Aldeota 0778 290157 Dionísio Torres 0722 270735 309972 Mucuripe 0732 274225 Guararapes 095 348825 Poor Neighborhoods Conjunto Palmeiras 001 23925 Parque Presidente Vargas 0014 28792 Canindezinho 0025 32547 30188 Genibaú 0027 32998 Siqueira 0026 3268 Middle Neighborhoods Autran Nunes 0032 34974 Dendê 0115 63344 59176 Parque Dois Irmãos 0093 55784 Cajazeiras 0155 76893 Messejana 012 64889 Source Adapted from the Municipal Secretariat for Economic Development of Fortaleza based on data from the 2010 Demographic Census The neighborhoods in the worst situation in terms of HDI R are Conjunto Palmeiras Parque Presidente Vargas Canindezinho Siqueira and Genibaú One factor contributing to these numbers was the rising unemployment in recent years This rate was considered the seventhhighest among any other metropolitan region in the country in the same period According to data obtained by the NHSS in 2021 the unemployment rate in the state is reflected in the deterioration of the labor market amid the new Coronavirus Pandemic The Covid19 pandemic reversed the recovery trend in economic activities resulting in a significant drop in the growth rate in 2020 Related Works According to Wang et al 2018 studies have used travel time as a comparative measure to understand the imbalance or balance between employment and housing and racial economic and gender disparities in urban areas The authors delineated their research on two questions waiting time as an intermediate proxy for a measure of accessibility in Ubers travel service and considering Uber as a virtual transport infrastructure it raises the question of whether the company is related to sociospatial polarization in a neighborhood or more equitable access regardless of socioeconomic profiles The results by Wang et al 2018 indicated that for UberX the estimated average waiting time is around 3 to 10 minutes with a standard deviation of around 1 to 3 minutes For UberBlack the estimated average wait time is around 3 to 13 minutes with a standard deviation of around 1 to 3 minutes The authors point out that for UberX the average has a more concentrated distribution with a lower average value than the other service is not surprising as UberX is a more popular and costeffective service which likely results in more services of this type In the market In addition UberBlack presented a cost of at least three times higher per minute compared to the UberX service and four times higher in cost per mile Bezerra et al 2019 explored using estimated arrival time data from Uber ride requests as a simple indicator of urban habitually Due to its nature scale and coverage Uber provides objective data on the interaction between a citys inhabitants and its infrastructure mainly transport infrastructure In this way it is possible to compare data at multiple levels such as cities and neighborhoods but also provides contextsensitive data providing insights into the impact of other factors that affect Uber drivers and trips including traffic incidents weather and other events In this perspective Bezerra et al 2019 surveyed the possibility of using Uber data to provide a simple fast lowcost time and contextsensitive indicator of urban livability For this the authors considered the Uber Ride Request URR API for the Brazilian city of Natal In order to test the hypothesis that the pricing of Ubers travel services is related to the socioeconomic characteristics of the places these trips board Silva et al 2020 carried out a study for the same city of Natal To achieve this objective they collected data on trip prices of the UberX service type throughout 2018 in addition to socioeconomic data at the level of Human Development Units provided by the Atlas of Human Development in Brazil With the data obtained it was possible to build predictive models using Machine Learning techniques so that they could later be submitted to regression analysis The previous works sought to highlight relationships between information about Uber trips and socioeconomic characteristics in specific locations These relationships covered aspects of price and distance behavior when analyzed with social factors that could indicate an influence on the supply and demand of Ubers travel services The findings of these surveys indicated that some socioeconomic characteristics of places could influence the conception of prices Considering this conjuncture it is possible to analyze the behavior of these prices when taking into account travel routes between regions with different financial incomes when the destination of these travel requests is the financial center of some regions From this perspective the present study seeks to find indications that a nonconcentration of financial centers can contribute to reducing the costs of Ubers travel prices for users residing in economically poorer regions Analyzing other regions with other economic contexts could indicate that a concentration of commercial activities would influence Ubers trip pricing process PROPOSED METHOD The process adopted in this execution strategy was adjusted considering the context of the research which involves Socioeconomic Data and travel prices in transport by application The steps are Obtaining UberX Pricing Data Obtaining Socioeconomic Data Exploratory Data Analysis Cleaning and Processing Data Creating Predictive Models and Obtaining Results Figure 1 illustrates the flow of these steps Figure 1 Process flow used in the research Figure 1 Process flow used in the research Source Author 2022 The process flow proposed above shows that these steps can be used in different application contexts However for this study the Uber Price Data Collection and Data Cleaning and Processing steps were adapted to the reality of the research objectives In addition the additional step of Obtaining SocioEconomic Data is collecting social information related to a locality or region This stage has several dimensions Education Age Income Ethnicity Employment among others To obtain Uber Price Data a strategy was developed to create a database containing the prices of Uber trips to the city of Fortaleza These prices needed to be consistent with the reality practiced by the service Due to this research was carried out to obtain information supporting the logic of creating these prices On the other hand prices for trips to Boston did not need to be simulated since it was possible to find a real database that contained this information For the Data Cleaning and Treatment stage a statistical analysis was carried out in order to validate the database created for the city of Fortaleza In this sense it would be possible to support the simulated price values statistically This analysis was not performed for the city of Boston since the prices contained in its base came from real trips In addition for this step an analysis was performed in Graphs using Centrality Measures These measures helped analyze the impact of travel paths for different routes In this way it would be possible to determine whether a particular neighborhood would have a greater or lesser impact on the average price of trips if a trajectory were changed In this context five neighborhoods of each rich middle and poor classes were selected for the experiment in the city of Fortaleza The choice criterion took into account the financial part and the location In this sense the information on the Average Personal Income of residents aged ten years or older obtained in the BACKGROUND AND RELATED WORKS section was used Thus five representatives of each class were extracted considering the distances from the wealthy region commercial center encompassing geographic North and East of the city the median region geographical center of the city and from the peripheral region South and Southeast geographic of the city As found in the research the city of Fortaleza concentrates on the wealthy neighborhoods in the North and East that coincide with the commercial center and has residents with high per capita income The geographic center region of the city coincides with neighborhoods with intermediate per capita income and the peripheral region South and Southeast represents the majority of lowincome neighborhoods For the city of Boston some neighborhoods and regions belonging to neighborhoods were selected The choice took into account their geographical distance from the citys financial center The aim was to observe the behavior of high and low prices when travel requests came from neighborhoods or regions further away or close to this financial center With the results obtained from this analysis it would be possible to evidence a price trend similar to that observed in the city of Fortaleza The chosen neighborhoods and regions also took into account the ease of availability of information on per capita income in the neighborhood or region of the city The total number of neighborhoods and regions in the city was 12 units that were described in the BACKGROUND AND RELATED WORKS section Obtaining Pricing Data from Uber The prices analyzed were for the UberX service type This was determined due to this considerations As noted earlier UberX is the most popular service of the Uber company providing more affordable prices for the population This makes it possible for users of various income groups to access this type of service increasing the scope of service users The database of real trip prices obtained for the city of Boston contains information from the UberX service Because of this the simulations to create the Fortaleza database took into account the same type of service allowing for more homogeneity in the observations for both cities under study To determine the prices of the trips that served to compose the simulated base of Fortaleza the Uber price simulator and the research on particular city characteristics were used This way it was possible to generate price samples for trips between neighborhoods including trips to and from the same neighborhood In addition research has found evidence that travel prices vary on certain days and times In this sense when necessary prices were readjusted according to the times and days since the price tends to increase at peak times and on certain days Ubers pricing simulator includes policy rules in the companys pricing calculations taking into account location base rate dynamic rate and variations in supply and demand However to make the pricing logic more realistic it was necessary to consider information inherent to the locations under study Due to this through the research it was observed that Uber trips in the city occur with a division between neighborhoods of about 4 That is there is an approximate division of 4 trips for routes between poorrich pooraverage mediumrich and othe r possible routes between them T rips originating in middleincome neighborhoods are more likely to have the same neighborhood as a destination thus defining a higher percentage for these cases of about 48 of this happening That is 48 of trips tend to occur within the same neighborhood This value ensures that the largest amount of travel happens to the same region This percentage could be different as long as it is significant enough to establish that most travel takes place in the same region In the other alternatives the destination of a user in this neighborhood could be any other neighborhood in the city these values being defined in a portion of 16 of the total trips for the other possibilities 48 t o 64 64 to 80 80 to 96 As the price is related to the day and time it was also necessary to include a logic consistent with the citys reality In this sense the research found that Uber trips happen primarily on weekends and a more significant number of requests happen between 7 am to 2 pm and from 5 pm to 8 pm IPLANFOR 2015 G1 CE 2021 In parallel with the schematization of this information simulations were carried out with the companys price simulator for each day of the week and every 15 minutes a price sample was collected from all neighborhoods considered in this study This process was carried out for two weeks between the end of August and the beginning of September 2021 The experiment also considered a time without holidays and significant events in the city After that the arithmetic means for each route were calculated With the simulated values in hand obtaining a range of price values for each route under study was possible With the considerations above about the day of the week and the time of day it was possible to build a database more consistent with the reality of the city containing the prices of simulated trips and other attributes In total a sample size of 100000 travel prices was generated If the origin and destination are different then the price range will correspond to the other prices simulated by the Uber price simulator Although this logic is repeated for the other routes and price ranges it is necessary to consider the price adjustment factor which is a peculiar issue for each location Because of this a readjustment function called reajustepreço was implemented which is called when the price list corresponding to each travel route is created According to this research Fortaleza has peculiarities in the variations in the price of Uber trips depending on the time of day Due to this the adjustment logic considered an increase factor between R 100 and R 200 which are the average values found for this type of variation In the algorithm above the logic is based on checking the peak hour intervals that happen during a day for that city Depending on the value of the randomness seed the day will be a weekend or not and this will cause this readjustment function to be called when relevant Figure below illustrates a flowchart of these steps Figure 2 Flowchart for building the Fortaleza database Figure 2 Flowchart for building the Fortaleza database Source Author 2022 For the city of Boston prices are already registered through the real database provided by the Kaggle platform This base has 46 columns containing information on travel origin and destination weather month distance time day price and other information about a trip The base contains just over 600000 lines and refers to data collected for 2018 The prices of trips from the base of that city can provide another indication of price increases for travel routes destined for a regions financial centers In this way it is possible to substantiate through another economic context developed country region that there is an indication of high prices for Uber trips when the destination tends to be financial centers Obtaining Socioeconomic Data Uber does not provide socioeconomic information on users of its travel services In this sense an alternative means of obtaining this data would be using the characteristics of the place where requests for these services are requested This study primarily used information referring to peoples income by neighborhood In this way it would be possible to analyze the financial impacts for users who leave the neighborhoods they live in considering routes between neighborhoods of different classes In addition other general information from the study sites seen in BACKGROUND AND RELATED WORKS section was considered in the analyses In this context data from the Human Development Index HDI were used for the city of Fortaleza These data are based on the Brazilian Demographic Census in 2010 the last carried out at the neighborhood level For research purposes information on HDI R HDI B and average personal income of 5 neighborhoods of each class for the city of Fortaleza were used rich poor and average The geographic locations were chosen considering a division that evidenced distinctions in the incomes of the residents of these neighborhoods since for this city the further away from the commercial center a neighborhood is the lower its income Table 1 of the Background and Related Works section displays HDI R information and average personal income Information on the HDIs for all neighborhoods in the city can be obtained from FORTALEZA 2022 The data on this average personal income as they are from the year 2010 were converted to real values for 2021 This was done because the simulations of travel prices were carried out in this period This way it mitigates the differences in inflation interest rates and economic changes in this period as possible To achieve this objective the Citizen Calculator provided by the Central Bank of Brazil was used which allows the conversion of monetary values in Reais from one year to monetary values in Reais from another year BANCO CENTRAL DO BRASIL 2022 From this perspective for the conversion of the values by the calculator the Broad National Consumer Price Index IPCA for the year 2021 was considered since according to the literature it is the most used to perform the monetary correction The IPCA portrays the broader sco pe of inflation in the economy The socioeconomic information obtained for the city of Boston was restricted to data collected by the 2017 Boston Planning and Development Division BPDARD Research Agency This agency conducted a survey formalized in a document called Neighborhood Profiles In this way it was possible to extract some neighborhoods and regions of the city so that it was possible to obtain evidence of an upward or downward trend in prices considering the financial center of the city as a reference point Information in this context can be found in the Background and Related Works section Data Cleaning and Treatment According to Haughton et al 2003 Data Cleaning and Treatment steps are relevant as unprocessed data are usually nonuniform and unpredictable In addition data can drive up costs when used raw From this perspective for the city of Boston the missing data also called missing data were analyzed The analysis of these data made it possible to observe that they dealt with different origins that had the absence of values for specific attributes For this reason and because the total amount of missing data represents a small fraction of the entire database set it was decided to eliminate them On the other hand there were no missing data for the city of Fortaleza since the base was generated in a controlled environment through simulations research and mathematical calculations Another important point regarding data cleaning and treatment is the observation of Outliers points outside the curve as they can generate invalid data that do not portray reality These points present a significant numerical departure from the other observations In our context of UberX travel pricing data Outliers emerge as very high prices relative to the mean and median for the city of Boston In this sense there are statistical techniques such as the Z Score to detect these points This technique indicates the numerical distance between a point and the sample mean In this sense it is based on standard deviation and tries to mitigate the influence of data location and size This technique was chosen because the Boston database is in the order of hundreds of thousands of lines On the other hand the Fortaleza database does not present outliers since the simulations performed by the companys simulator provided ranges of values close to the averages obtained later Another essential point in this context of data treatment is the verification of the distribution of sample data Many statistical methods assume that the data follow a Normal Distribution However most of the data are not typical and violate some of these tests such as the TStudent test In this sense it was verified if the prices of the trips of the routes under study followed a Normal Distribution for the Fortaleza database The paths under study were poor rich poor medium and medium rich After checking each path a Normal Distribution was followed followed by the KolmogorovSmirnov Normality test for validation The results confirmed that travel prices for this simulated database did not follow a Normal Distribution Because of this the nonparametric Wilcoxon test was used The Wilcoxon test is commonly treated as a nonparametric version of the TStudent test for paired samples It tests whether the distribution of differences between two samples is symmetric and zerocentered In this sense the pairs considered were those mentioned above poor rich and poor medium paths Thus considering a confidence level of 95 and an Alpha significance level of 5 it was determined in the execution of the test that the alternative hypothesis rightsided test was verified that the average of the prices of the poor rich route is higher than the average poormedium route prices Thus it would be possible to verify if there would be statistically significant evidence that the average price of the poorrich route is higher than the average price of the pooraverage route In other words whether the groups differ statistically in a representative way As the literature states that generally data on the provision of services that involve monetary values are not predictable due to the very nature of the conception of these values for example transport trips by application the statistical tests were performed only for the Fortaleza database since which was a simulated base that needed this verification to confirm the reality of these types of data Due to this the Boston base was not statistically tested in this context not only for these reasons but also because it is a real base of prices of transport trips by the application that served as support in the foundation of the observations carried out for the city of Fortaleza Considering the weights of the databases under study and the fact that the Fortaleza database is simulated some measures of centrality for graphs were also treated for this database to ratify the reality of its data and the research carried out for this city In this sense firstly the data from the Fortaleza database were converted into a weighted graph in which the weight of the edges is the average prices of trips between the districts considered for the study The routes between these neighborhoods took into account the logic of creating the base seen earlier in this research The neighborhoods are recorded in Table 1 To convert this base into a graph it was necessary to use the networkx library which allowed manipulation at the level of nodes and edges and the attribution of the price as the weighting factor Furthermore this library made it possible to render the graph so that the paths between the neighborhoods under study were visible considering their respective weights This manipulation made it possible to use the Centrality Metrics for graphs The methods that allowed performing metrics calculations are also available in the networkx library The metrics used in this research were Degree Centrality Proximity Centrality and Intermediation Centrality The nxviz library was used to create the graphs with the results of the Centrality Metrics Creating Predictive Models As previously stated the database for the city of Boston is real That is a compilation of information about various UberX trips between the neighborhoods of this city The analyses on this basis support studies on UberX trips in Fortaleza to provide evidence that even in another context another country with different socioeconomic conditions climate and other aspects many pricerelated behaviors can be similar regardless of location From this perspective the Boston database has attributes for each trip such as time day month origin destination price distance and weather Considering that the this research focuses on travel prices an analysis was carried out using a Correlation Matrix to verify which pairs of attributes were more correlated Figure illustrates this matrix Figure 3 Correlation matrix for the Boston database Source Author 2021 This matrix shows that the closer to 1 the intersection between 2 attributes the more correlated they are Otherwise the closer to 1 the less correlated they are Therefore it is observed that the attributes price price and distance distance are the pair with the highest correlation among the others Due to this and the research found for the city of Fortaleza it was observed that price and distance are great influencers of the UberX travel service to the detriment of other attributes such as weather time and day Considering this situation and the analysis to verify the trend convergence for the prices of both cities new functionality was proposed for the Uber company This was raised because it was observed that the trips of this service on Uber offer trips with their prices already stipulated However in this market segment some companies adopt the inverse mechanism in which the user offers a price and the transport application returns the best distances for that offered price An example of this strategy is the company InDriver 2022 In this sense the creation of Machine Learning Models was proposed to predict the dependent variable distance based on the independent variable price and other independent variables in the Boston database In this way it would be possible to provide more freedom to the customer to choose the price of a trip and consequently enable a reduction in costs The Machine Learning algorithms used to create these models were chosen because they are widely used in the literature in the field of regression considering the data domain used in this research The algorithms are Simple Linear Regression SGD Regression Decision Tree and Random Forest The method for verifying the accuracy of the models was the R² Score The method used to standardize the base variables was Standard Scaler The method to perform the crossvalidation was the Cross Val Score The parameters used in the methods followed those recommended by the literature concerning the domain of the values involved and the size of the considered database In this sense for the Train Test Split method a Test Size of 20 and a Train Size of 80 were used EXPERIMENTS AND DISCUSSION OF RESULTS This section describes the results and discussions about the cities under study The results obtained for the distance prediction models for the city of Boston are also described and discussed as well as the behavior of prices considering changes in travel routes to the city of Fortaleza Also for this last city the results and discussions about the statistical analyzes and measures of centrality are described Finally a possible change in the prices of Ubers trips in Fortaleza is discussed if there was a greater distribution of its financial center among the neighborhoods Boston A scatter plot Figure was plotted to show price variability concerning travel distance In this way it would be possible to verify how much the Outliers were present in the base and the coherence between distance and price in the sense of following or not an increasing proportionality of values Figure 4 Dispersion between price and distance for Uber rides in Boston Source Author 2021 There are trips on Uber with longer distances and lower prices but there are also trips with shorter distances and higher prices The results previously observed by the Correlation Matrix help to understand this behavior Distance and price are the most correlated among the values of all pairs of base variables Figure below indicates average prices per request destination This graph shows that the prices of trips to Fenway Finacial District Boston University and Northeastern University tend to be more expensive compared to other neighborhoods Figure 5 Average prices by Travel Destination Source Author 2021 The Financial District is the citys financial center which may explain the higher value for money Boston and Northeastern Universities attract many students who use their shuttle services Because of this although these universities have a slightly higher price than other destination neighborhoods they have a lower average price than trips to the financial center This may indicate that the neighborhoods that are a little further away from the financial center may allow for a decrease in travel prices compared to the prices charged for the financial center even though these neighborhoods are a little further away from the financial center and have factors of attractions such as universities hotels museums and other highdemand locations After Exploratory Data Analysis and the weighting of the Correlation Matrix we proceeded to create Machine Learning models considering the distance variable as dependent on the others in the base As the manipulated data are of the continuous domain and are being treated by regressors the results of Determination Coefficients R² were obtained for each regressor For this the CrossValidation technique was used in which a number of folds equal to 5 was determined and the evaluation parameter was R² which refers to the coefficient mentioned above In this sense five values of R² were obtained for each regressor under study and their respective averages were calculated The results can be seen in Table 2 below Table 2 Results of the Average Coefficients of Determination TECHNIQUE R² average LINEAR REGRESSION 8 SGD REGRESSION 7 DECISION TREE 91 RANDOM FOREST 94 Source Author 2021 Random Forest is one of the most used Machine Learning algorithms in regressions of values on continuous domain datasets and classifications of binary values When this algorithm is used to combine the prediction of a set of Decision Trees to obtain a single answer as an output then it tends to present better performance than the one obtained with each tree of a model in isolation due to the possibility of variance reduction ALVARENGA 2018 In this sense it can be seen from the table above that an average R² of 94 was obtained for this algorithm the highest among the other techniques used This means that this model fits the data well and the predictor variables explain 94 of the predicted distance data Fortaleza One of the main factors determining the socioeconomic disparities between the studied neighborhoods of Fortaleza is the HDI Figure shows indices between the most developed and the least developed Figure 6 HDI of the neighborhoods under study for Fortaleza Source Author 2021 Similar to the HDI graph Figure can be seen which illustrates the differences in the average personal income of inhabitants aged ten years and over in each neighborhood under study Figure 7 HDI Average income of inhabitants of some neighborhoods in Fortaleza Source Author 2021 In Figure we can see the disparity in the average prices of trips according to origin Figure 8 Average price per neighborhood of origin of the service request Source Author 2021 It is noted that users from poorer and less developed neighborhoods in Fortaleza end up spending more on commuting via app transport than those departing from wealthier neighborhoods For example considering the poor neighborhood Canindezinho the average price of a user who requests a trip from that neighborhood as an origin is R 1723 the destination being any of the other neighborhoods under study Figure illustrates the average price spent according to the time of travel Note how much more advantageous it is to use the Uber service at night from 8 pm in Fortaleza Figure 9 Average price of trips per hour in Fortaleza Source Author 2021 Figure illustrates the average prices considering each class of the neighborhoods under study poor medium and rich Thus it is possible to analyze the behavior of prices from a more holistic view distinguishing travel routes by neighborhood class Figure 10 Average prices per route from poorer to richer neighborhoods Source Author 2022 It can be seen from the figure above that the route that involves the class of poor and rich neighborhoods has the highest average prices On the other hand the route between the middle and wealthy neighborhoods has the lowest average This corroborates the research for this city in that middleincome neighborhoods in addition to being geographically closer to the commercial center and the wealthiest neighborhoods have a lower cost when requesting Uber trips to the detriment of poor neighborhoods As a complement to Figure Figure was plotted below which illustrates the inverse path of the one shown above travel from wealthier neighborhoods to poorer neighborhoods Figure 11 Average prices per route between richer to poorer neighborhoods Source Author 2022 It can be seen from the graph above that the upward behavior of prices is repeated however with an increase in the averages This may show that the return of users from poorer neighborhoods usually pays even more expensive compared to oneway trips This can be related to security issues mobility and supply and demand After the Exploratory Data Analysis a statistical analysis was performed for this Fortaleza database The purpose of this analysis was to verify the validity of this simulated base in statistical terms so that it was possible to extract information more consistent with reality In this sense it was verified if the samples of the studied routes followed a Normal Distribution Each sample consisted of 1000 trips considering the desired route filter poor rich poor medium and medium rich The first path verified was the poor rich The value of p was smaller than that of alpha α This means that the null hypothesis cannot be rejected and this distribution is not Normal The other routes under study also showed the same behavior following a nonnormal distribution Table 3 contains the p values for the routes under study considering the alpha value assigned Table 3 P values for the routes under study TRAVEL PATH PVALUE α of 5 POOR TO middle 39E06 POOR TO RICH 27E55 middle TO RICH 13E06 Source Author 2022 As the pvalue was less than the alpha significance level the probability of getting data like this is tiny Thus it can be concluded that the values of travel prices do not follow a Normal Distribution In other words as the results obtained above showed that travel prices do not follow a Normal Distribution Wilcoxons NonParametric Test was used The pairs considered for this test were the poorrich and poormedium paths Considering a confidence level of 95 and an alpha significance level of 5 it was determined in the implementation to test the alternative hypothesis rightsided test that the average price of trips on the poorrich route was more significant than the average of the prices of the poormedium route As a result a lower pvalue than the alpha value was obtained showing statistically significant that the average price of travel from poor to rich is higher than the average price of travel from poor to medium In other words the groups statistically differ in a representative way and corroborate the histogram of average travel prices between rich poor and medium neighborhoods Then these samples were analyzed for some centrality measures for graphs This analysis was important in the sense of emphasizing how important a neighborhood can be in influencing the pricing of inapp transport trips In this sense the samples under study were first converted into graphtype objects After the conversion it was possible to plot the routes in terms of their travel connections with their respective average prices For a more consistent positioning with reality a graph of these neighborhoods was plotted on a map of the city of Fortaleza For this each neighborhoods latitude and longitude information was collected to maintain the reality of distances and proportionality This way it would be possible to visualize the distances between the neighborhoods geographically Figure illustrates this representation Figure 12 Graph of the neighborhoods under study on a map of Fortaleza Source Author 2022 From the figure above we can see the great distance between the poor neighborhoods in the Southwest part of the map and the wealthy neighborhoods in the Northeast portion In turn the average neighborhoods are represented by the nodes on the transverse line on the map After converting the samples into a graph it was possible to determine the centrality measures The measures used were Degree Centrality Proximity Centrality Closeness and Intermediation Centrality Betweenness Figure below indicates the Degree of Centrality for the data under analysis Figure 13 Degree Centrality for the data under analysis Source Author 2022 It can be seen from the graph above that the poorest neighborhoods under study Canindezinho Parque Presidente Vargas have a higher Degree of Centrality This means that these neighborhoods have a high frequency of travel requests and those that minimize the average price for a given route This considers the research context that weights the average prices on the edges and indicates that an important node in a graph is connected to many nodes In this way it shows that in this context a vital neighborhood takes into account the cost of the price of trip and not just the geographical issue The following measure analyzed was the Proximity Centrality Figure below illustrates its behavior Figure 14 Proximity Centrality for the data under analysis Source Author 2022 Again high values are noted for the poorest neighborhoods This is because many poor neighborhoods are close to each other in average prices That is considering the poor neighborhood of Canindezinho as an example this indicates that it is close to most of the poor neighborhoods analyzed In other words average closer to the others On the other hand Cajazeiras middleclass neighborhood is a neighborhood that presents a greater distance in average prices between the average neighborhoods analyzed In the context of this research this means that this proximity is not restricted to the geographic issue The poor neighborhoods considered because they are geographically close presented similar price averages so a high value of this measure for the poor neighborhoods and lower values for the middle and wealthy neighborhoods since they are more distant from each other In this sense it is essential to point out that the weight considered in the experiments was the average of prices which leads us to show that through travel prices it is also possible to indicate the proximity between places from a price perspective and not only from a price perspective geographic Finally the Centrality of Intermediation was analyzed Figure below illustrates its behavior Figure 15 Intermediation Centrality for the data under analysis Source Author 2022 The figure above shows values for middleclass neighborhoods such as Cajazeiras and Dendê in addition to other richclass neighborhoods such as Meireles This metric indicates the number of times a node acts as a bridge along a path As the middleclass neighborhoods are geographically located in the central portion of the city this shows how much they are a bridge between the poor neighborhoods in the South and Southwest territorial portion and the wealthy neighborhoods in the North and Northeast portions The presence of the wealthy Meireles neighborhood with a high value for this metric evidences its geographical proximity between the wealthy neighborhoods and the commercial center indicating that many routes pass through it It is important to note that as the weight was passed as a weighting in the graph and refers to the price of the trip then this means that a bridge district may be at a greater distance than another district considered a bridge given the possibility of a trip with a price shorter but with greater distance depending on the day time and demand of the transport service by application After analyzing the data from the Fortaleza database as well as the socioeconomic study for this city it was observed that there is a high concentration of income and the consequent geographic segregation of the population people with greater purchasing power occupy the area East of the city which concentrates the noblest portion of the neighborhoods and the commercial center and those with less purchasing power occupy the South and West zones In addition there was daily overcrowding of the public transport system by buses and congestion on the main roads A large part of this problem is explained by the concentration of most economic activities and consequently the more significant number of jobs in the North and Northeast regions In contrast most of the population resided in the West and South zones of the city hence the desire to travel long distances Considering this situation 2010 data were collected regarding the Average Monthly Income of people aged ten years and over by neighborhood However as these socioeconomic data are from 2010 it is essential to adjust their values to match better the reality of prices currently practiced in the market In this sense considering that the simulations of UberX trips in Fortaleza were carried out in the year 2021 a conversion of these income values was carried out The conversion of the Average Monthly Income values was carried out using the general IPCA inflation index for 2021 considering the weights seen in the BACKGROUND AND RELATED WORKS section After the conversion the values were as follows Conversion of values for 2021 Average Personal Income of all wealthy neighborhoods R 308931 Average Personal Income for all mediumsized neighborhoods R 122143 Average Personal Income for all poor neighborhoods R 79286 A survey carried out in 2019 by the National Association of Urban Transport Companies NTU which brings together more than 500 urban and metropolitan bus companies throughout Brazil shows that transport by apps attracts more people who habitually use only transport public than people who used their cars frequently DIÁRIO DO TRANSPORTE 2020 According to the survey more than 60 of app users came from public transport The survey focused on ten Brazilian capitals including Fortaleza The research was carried out through electronic interviews using questionnaires on social networks One thousand four hundred ten questionnaires were analyzed from October 16 to November 22 2019 It was observed that most respondents 52 use transport by app 2 to 4 times a week an average of 3 trips per week The survey showed that work is the main reason passengers use this service every day Sporadic or weekly users use the service mainly for leisure Considering that most people in the city of Fortaleza need to travel to the commercial center having as their origin primarily the peripheral neighborhoods and considering that these trips have education and work as their leading causes then for this research an average number of trips per week of 3 times was adopted for people who live in a lowincome neighborhood peripheral and for people who live in a highincome neighborhood close to the shopping center Considering that most months of the year have four weeks the amount of monthly travel would be 12 times on average In this sense considering transportation by Uber app the research showed that the average price of trips in lowincome poor neighborhoods between themselves R 1354 is lower compared to the average price of trips in neighborhoods poor to higherincome richer neighborhoods that is poormedium route R1657 and poorrich route R2888 A proposal to try to reduce these costs of Uber trips for users in the poorest neighborhoods would be a greater decentralization of the citys commerce encompassing more of the middle and poor neighborhoods since the city has a sizeable commercial concentration distributed in the city North and Northeast region From this perspective it was evidenced that the commercial center is around wealthy neighborhoods Users in poor neighborhoods spend an average of R 2888 one way as the user can return to their neighborhood by another means of transport to get to the shopping center If the number of trips to the shopping center of a user in a poor neighborhood is on average 12 times a month then it would give a monthly cost of R34656 This corresponds to about 4371 of the average personal income of a person residing in a lowincome neighborhood On the other hand users in wealthy neighborhoods spend an average of R 856 per trip to the commercial center a region that concentrates most of their territory within these same neighborhoods trips from wealthy to rich Also considering an average number of trips of 12 times a month it would give a monthly cost of R 10272 This corresponds to about 332 of the average personal income of a person residing in a highincome neighborhood This means that users in poor neighborhoods spend about three times more on Uber trips in Fortaleza than those who live in wealthy neighborhoods when the destination is the citys commercial center In addition users from poor neighborhoods spend an average of R 1657 on travel to mediumsized neighborhoods This means R 1244 compared to the average travel price from poor to rich This would give per month considering the same amount of 12 trips a value of R 14928 corresponding to 1882 of the average personal income of a lowincome person A cost reduction of 4307 compared to the poorrich route Users from wealthy neighborhoods spend an average of R1728 on travel to middleclass neighborhoods This means an increase of R 872 compared to the average price of trips from wealthy to rich R 856 This would give per month considering the same number of trips of 12 a value of R 20736 A cost increase of 1018 compared to the richrich path However this value would correspond to 671 of the income of people residing in these neighborhoods Users from poor neighborhoods traveling to middleclass neighborhoods would have an expense representing 1882 of their income That is although there is a reduction in costs for residents of poor neighborhoods the expenditure for this portion of the population in this context would still be about three times greater than that of people residing in wealthy neighborhoods but would have a much more significant impact lower when compared to the poorrich path that represented almost half of the average personal income 4307 Figure and Figure show respectively the values of the average prices average monthly cost and the value in percentage of the impact on Average Personal Income MPR of travel routes from poor to wealthy neighborhoods POOR RICH and of routes from wealthy to affluent neighborhoods RICO RICO for the current scenario in which there is a concentration of the commercial center in the city of Fortaleza and for a possible scenario in which the financial center was more comprehensive encompassing at least the middleclass neighborhoods Figure 16 Paths under study in a centralized shopping center setting Source Author 2022 Figure 17 Paths under study in a more decentralized shopping center setting Source Author 2022 If the shopping center is decentralized in a way that it is closer to at least the middle neighborhoods then there could be a travel cost reduction for users in poor neighborhoods of at least 4307 That would be a savings of around R14928 per month or R179136 per year As for users who live in wealthy neighborhoods the decentralization of commerce to mediumsized neighborhoods would raise travel costs to just over 100 For poor neighborhoods this monthly savings would represent 1882 of the average monthly personal income This increase would represent 671 of the average monthly personal income for wealthy neighborhoods This means that the impact of price increases for users in wealthy neighborhoods would continue to be smaller than that for poor neighborhoods when they leave their locations for a shopping center located more in middleclass neighborhoods However for the poorest neighborhoods there would be more significant savings compared to the current location of the commercial center and the income of the richest would be affected by less than 7 Considering the indicators found a possibility for decisionmaking would be for the Brazilian federal government to develop public policies to promote trade in financially poorer regions As evidence for the city of Boston some lowerincome regions can attract people through commercial activities such as bars sports stadiums parks and restaurants Although these areas of Boston lack the financial power of the citys commercial center they still manage to mitigate some of the high prices of transportation services in the context of Uber In this sense for the city of Fortaleza tax incentives could be proposed for entrepreneurs in low and middleincome regions promoting economic incentives in these regions and enabling a reduction in travel costs in transport by application Nevertheless the state government could work together with policies promoted by the federal government to increase the practical efficiency of these incentive measures In this way the financial impacts could be more likely to materialize reducing spending on the average monthly personal income of lowincome residents and enabling a reduction in the economic aspect in the context of social inequalities CONCLUSIONS In this work analyzes were made from a Data Science process that considered Socioeconomic data and travel prices for transport by the Uber app More precisely the behavior of prices was studied when compared with different travel routes of this company Thus it was possible to observe the impact of these prices on the Average Personal Income of users residing in low and highincome regions Based on these observations a decisionmaking process was proposed which consists of a policy of more significant tax incentives for Brazilian entrepreneurs in order to promote the trade in low and middleincome regions promoting the economic incentive of these regions enabling a reduction in travel costs in transport by application In this way a lack of financial concentration encompassing financially poorer regions could impact the reduction of prices of transport trips by the Uber app reducing the expenses of the Average Personal Income of financially poorer users without however impacting a high way the Income Personal Average of the wealthiest users This research can also contribute to guidelines in developing public policies in the context of social inequalities especially in the economic aspect The results reveal an indication of cost reduction for an economically poorer population if shopping centers are more distributed in a region However the results of these analyzes need further studies that can detail this behavior since these inequalities have multifaceted characteristics As future work the authors are interested in investigating other scenarios to validate the results already obtained One can also explore other transport services such as Moto Taxi and Public Transport to explore other characterizations of price changes and assess their representativeness There is also the possibility of improving the methodology adopted in this research inserting more statistical validations and other measures of centrality or other validation parameters In addition one can try to apply other Data Science approaches from other works on the data found here either for greater detail of Exploratory Data Analysis or to detect other patterns With this it would be possible to compare some results obtained by other studies with those already obtained here REFERENCES ALVARENGA JUNIOR Wagner José Hyperparametric Optimization Methods A Comparative Study Using Decision Trees and Random Forests in Binary Classification 2018 Dissertation Masters in Electrical Engineering Faculty of Engineering University of Minas Gerais Belo Horizonte 2018 AMARAL Fernando Introduction to Data Science Data Mining and Big Data 1 ed Rio de Janeiro Alta Books 2016 ANSELIN Luc Spatial econometrics methods and models In Studies in Operational Regional Science v 4 2013 Available at httpslinkspringercombook1010079789401577991aboutbookcontent Accessed on 25 jun 2022 AVRIM Blum HOPCROFT John KANNAN Ravi Foundations of DataScience Cambridge University Press 2018 432 p BANCO CENTRAL DO BRASIL Citizen Calculator Available at httpswwwbcbgovbracessoinformacaocalculadoradocidadao Accessed on 25 jun 2022 BEZERRA Aguinaldo et al The preliminary exploration of uber data as an indicator of urban liveability In International Conference on Cyber Situational Awareness Data Analytics and Assessment Cyber SA IEEE 2019 p 18 BOSTON PLANNING DEVELOPMENT AGENCY RESEARCH DIVISION BPDARD Neighborhood Profiles 2017 Available at bostonplansorgresearchmaps Accessed on 28 jun 2022 BORBA Elizandro Centrality Measures in Graphs and Applications in Data Networks 2013 Dissertation Masters in Applied Mathematics Institute of Mathematics Federal University of Rio Grande do Sul Porto Alegre 2013 Cortez P and Santos MF 2015 Recent advances on knowledge discovery and business intelligence Expert Systems 32 433 434 doi 101111exsy12087 DONGES Niklas Random Forest Algorithm A Complete Guide Builtin Jul 2021 Available at httpsbuiltincomdatasciencerandomforestalgorithm Accessed on 28 jun 2022 DUSI L A The use of smartphone applications in individual transport 99Taxis and Uber 2016 Course Completion Work Bachelor of Civil Engineering University of Brasília Brasilia 2016 G1 CEARÁ Traffic in Fortaleza is 68 slower at peak times says survey 2018 Available at httpsg1globocomcecearanoticiatransitodefortalezafica68maislentoemhorariosdepicodizpesquisaghtml Accessed at 25 June 2022 GOVERNMENT OF THE STATE OF CEARÁ Available at httpswwwcearagovbr Accessed on 25 jun 2022 HAUGHTON D et al 2003 Effect of dirty data on analysis results Eighth International Conference on Information Quality P 64 79 2003 InDriver Available at httpsindrivercomencity Accessed on 05082022 FORTALEZA PLANNING INSTITUTE IPLANFOR Fortaleza Mobility Plan PlanMob 2015 FORTALEZA PLANNING INSTITUTE IPLANFOR Mobility Available at httpscatalogodeservicosfortalezacegovbrcategoriamobilidade Accessed on 27 jun 2022 MELNIK M Demographic and Socioeconomic Trends in Boston What weve learned from the latest Census data Boston Redevelopment Authority v 29 Nov 2011 NHSS NATIONAL SURVEY BY SAMPLE OF HOUSEHOLDS Available at httpscidadesibgegovbrbrasilcefortalezapanorama Accessed on 13 Nov 2021 MUNICIPAL CITY HALL OF FORTALEZA Human Development by neighborhood in Fortaleza Available at httpswwwfortalezacegovbrnoticiasprefeituraapresentaestudosobredevelopmenthumanoporbairro Accessed on 25 jun 2022 QUICK Bruno Ideas and Business transport by application Sebrae 2020 Available at httpsbibliotecassebraecombrchronusARQUIVOSCHRONUSIDEIASDENEGOCIOPDFS510pdf Accessed on 25 jun 2022 Sadouk L Gadi T Essoufi EH A novel costsensitive algorithm and new evaluation strategies for regression in imbalanced domains Expert Systems 2021 38e12680 httpsdoiorg101111exsy12680 SILVA J LIMA L BEZER I A methodology oriented to sociodemographic data for predicting Uber X prices In Congresso Brasileiro de AutomáticaCBA v 2 n 1 Dec 2020 SILVEIRA Daniel Income inequality grows in the Northeast and decreases in other regions points out IBGE G1 Rio de Janeiro 2020 Song I Y and Zhu Y 2016 Big data and data science what should we teach Expert Systems 33 364 373 doi 101111exsy12130 TRANSPORT DIARY Available at httpsdiariodotransportecombr20200130maisde60dosusuariosdosaplicativosvieramdotransportepublicoeprecoestaentreos mainreasonsforexchange Accessed on 07 Feb 2022 UBER Available at httpswwwubercomglobalptbrpriceestimate Accessed on 10132021 WANG Mingshu MU Lan Spatial disparities of Uber accessibility An exploratory analysis in Atlanta USA Computers Environment and Urban Systems v 67 p 169175 2018 3 1 SCENARIO WITH CENTRALIZED FINANCIAL CENTER SCENARIO WITH DECENTRALIZED FINANCIAL CENTER

Envie sua pergunta para a IA e receba a resposta na hora

Recomendado para você

Estruturas de Seleção em Linguagem C

12

Estruturas de Seleção em Linguagem C

Introdução à Lógica e Programação

UMG

Trabalho de Programação Linear

1

Trabalho de Programação Linear

Introdução à Lógica e Programação

UMG

Relato de Experiencia - Tipos de Testes em Aplicativos e Sistemas com Android Studio

4

Relato de Experiencia - Tipos de Testes em Aplicativos e Sistemas com Android Studio

Introdução à Lógica e Programação

UMG

Roteiro de Aula Pratica 2 - Simulacao de Redes com Cisco Packet Tracer

15

Roteiro de Aula Pratica 2 - Simulacao de Redes com Cisco Packet Tracer

Introdução à Lógica e Programação

UMG

Modelagem de Dados - Roteiro Aula Pratica 2 - Criacao de DER com Workbench MySQL

4

Modelagem de Dados - Roteiro Aula Pratica 2 - Criacao de DER com Workbench MySQL

Introdução à Lógica e Programação

UMG

Teste de Software: Qualidade e Normas

57

Teste de Software: Qualidade e Normas

Introdução à Lógica e Programação

UMG

Exercicios de Lógica

59

Exercicios de Lógica

Introdução à Lógica e Programação

UMG

Roteiro de Aula Prática: Programação e Desenvolvimento de Banco de Dados

4

Roteiro de Aula Prática: Programação e Desenvolvimento de Banco de Dados

Introdução à Lógica e Programação

UMG

Lista de Exercícios de Programação - Cálculos e Conversões

3

Lista de Exercícios de Programação - Cálculos e Conversões

Introdução à Lógica e Programação

UMG

Clinica Veterinaria

2

Clinica Veterinaria

Introdução à Lógica e Programação

UMG

Texto de pré-visualização

A DATA SCIENCE APPROACH IN A SOCIOECONOMIC ANALYSIS OF PRICES FOR TRANSPORT TRAVEL BY UBER APP G 1 B 2 1 Federal University ZZ Brazil Email g 2 Federal University Email b Studies that use data from the transport company Uber showed that there are factors that contribute to the increase in prices of its travel services In this context this research aims to analyze travel routes for lowincome users and contribute to reducing these prices For this we sought to answer If a financial center were closer to economically poorer neighborhoods would there be a change in the average prices of these trips Could this change financially improve the lives of lowincome people The purpose of our research to answer these questions was to investigate this factor of financial concentration in territorial regions analyzing prices and socioeconomic data in the South American city of Fortaleza located in the country Brazil and from the North American city of Boston located in the United States of America As results obtained it was observed that in a more decentralized scenario of a financial center lowincome users of Uber in Fortaleza could have their trip prices reduced by about 4307 This reduction would represent a monthly savings of around 1882 of their Average Personal Income For users living in wealthy highincome neighborhoods this decentralization would increase travel costs to just over 100 However this increase would represent 671 of their Average Personal Income Keywords Transport by Application Socioeconomic Data Data Science Exploratory Data Analysis INTRODUCTION According to Quick 2020 transport by apps also known as a taxi by app and paid ride are digital passenger transport services transport of meals and delivery of various items The participation of people in this market to acquire extra income or even find a more profitable job is a worldwide phenomenon Some companies in this niche are Uber InDriver Lyft 99 Cabify Rappi and iFood In this context some factors can influence the supply and demand of transport by applications such as urban and demographic characteristics income competition availability of other means of transport and tourist flow peculiar to each city within others QUICK 2020 In this perspective recent research has sought to find relationships between the socioeconomic characteristics of a region and some aspects of the Uber app transport company SILVA 2020 These aspects are accessibility waiting time neighborhood quality of life indicators livability indicators for cities and neighborhoods and urban planning However according to Silva 2020 these studies have not explored the price dimension in the relationship between the service offered by Uber and the socioeconomic characteristics of the places of embarkation andor disembarkation Through a dataoriented methodology Silva 2020 observed that time and distance are related to the pricing process of Ubers travel service to the city of Natal located in Brazil and that this would enable an improvement in travel strategies of supply and demand for that service However this process may present other factors that may contribute to the design of these prices In this perspective this work analyzed prices considering socioeconomic data from the cities of Fortaleza in Brazil and Boston in the United States of America in order to show that users who live in financially poorer neighborhoods and use Ubers travel service end up paying more expensive than residents of wealthier neighborhoods when the destination is the commercial center Thus it would be possible to propose a decentralization of these centers to soften the expenses of the income of the poorest users without affecting in a very significant way the users of wealthier regions As for both cities there was convergence in the behavior of high prices in trips to shopping centers starting from poorer neighborhoods in this research also proposed a new functionality for Uber to give more freedom to the user of the service This is the choice of travel utilizing a price bid in which the application returns the best distances based on the offered price Distance prediction models were created to achieve this functionality using regressor algorithms and validation tests So far for the city of Fortaleza no public data of travel companies by application was found Due to current legislation such as the General Data Protection Law GDPL the disclosure of information in this context is restricted Because of this we chose to simulate a database considering Ubers price simulation platform and the research carried out on peak hours demand and service offer per week traffic and prices for that city The relationships found served as a basis for evidence of the high prices offered to travelers residing in financially poorer neighborhoods and those who were destined for the citys commercial center With the socioeconomic data obtained for the city of Boston and the results of the analyses it was possible to evidence the exact behavior of price increases for destinations that concentrate on financial centers starting from poorer neighborhoods We chose the city of Fortaleza because of the ease of finding socioeconomic data at the neighborhood level and because it belongs to the Northeast region of Brazil with socioeconomic characteristics similar to other capitals in this region heavy traffic predominantly hot climate and income inequality SILVEIRA 2020 In the same way the city of Boston was chosen as it was the city that found a more complete and coherent database at the neighborhood level although the socioeconomic data are not as easily accessible as those found for Fortaleza This research aims to analyze the travel routes of lowincome users and contribute to reducing the prices of these trips in transport by the Uber app For this we seek to answer If a financial center were closer to economically poorer neighborhoods would there be a change in average prices Could this change financially improve the lives of lowincome people The rest of this paper is structured as follows Section 2 deals with information about transport by applications emphasizing the company Uber and its importance in the market Section 3 shows the strategy for applying the contents collected in Background and Related Works to carry out the analysis and the procedure for representing the observations Section 4 deals with analyzing travel data for both cities In this section we analyze the socioeconomic data of these incomes at the neighborhood level to highlight the possible relationships between travel prices and regions of high and low commercial concentration Sections 5 we summarize the results obtained from previous analyses and discuss the possible contributions of this research BACKGROUND AND RELATED WORKS Subsection 21 explains the socioeconomic data of Fortaleza and subsection 22 concerns about related works Socioeconomic Data of Fortaleza This subsection deals with obtaining socioeconomic data for the cities of Fortaleza This research focused on the income dimension for the two cities to make it possible to obtain relationships between travel prices and per capita income homogenizing a financial perspective for this study The city of Fortaleza the capital of the state of Ceará is located on the shores of the Atlantic Ocean and located in the Northeast region of Brazil in a tropical climate zone marked by high humidity The local economy is relatively diversified with significant secondary and tertiary activities Its territory is highly sought after by tourists due to the presence of beautiful beaches and the rich local culture GOVERNO DO CEARÁ 2022 Data from the Brazilian Institute of Geography and Statistics BIGS in 2010 revealed that in the income item one of the three main items of the Human Development Index HDI the average value per capita in Brazilian currency in Fortaleza has increased by 8518 in the last two decades from R45704 in 1991 to R61048 in 2000 and R84636 in 2010 This index called HDI R e represents the average monthly income per capita by neighborhood Extreme poverty measured as the proportion of people with a per capita household income of less than R7000 went from 1525 in 1991 to 902 in 2000 to 336 in 2010 According to Silveira 2020 the HDI R is considered an indicator of the average potential of residents of a neighborhood to obtain goods and services It is used as an indicator of peoples ability to secure a standard of living that can meet their basic needs Table 1 shows five neighborhoods of each income class rich poor and middle for the city of Fortaleza in terms of HDI R in 2010 The values refer to people aged ten years and over Table 1 HDI R of some neighborhoods of different income classes in Fortaleza Rich Neighborhoods HDI R PERSONAL AVERAGE INCOME R AVERAGE PERSONAL AVERAGE INCOME R Meireles 0953 365954 Aldeota 0778 290157 Dionísio Torres 0722 270735 309972 Mucuripe 0732 274225 Guararapes 095 348825 Poor Neighborhoods Conjunto Palmeiras 001 23925 Parque Presidente Vargas 0014 28792 Canindezinho 0025 32547 30188 Genibaú 0027 32998 Siqueira 0026 3268 Middle Neighborhoods Autran Nunes 0032 34974 Dendê 0115 63344 59176 Parque Dois Irmãos 0093 55784 Cajazeiras 0155 76893 Messejana 012 64889 Source Adapted from the Municipal Secretariat for Economic Development of Fortaleza based on data from the 2010 Demographic Census The neighborhoods in the worst situation in terms of HDI R are Conjunto Palmeiras Parque Presidente Vargas Canindezinho Siqueira and Genibaú One factor contributing to these numbers was the rising unemployment in recent years This rate was considered the seventhhighest among any other metropolitan region in the country in the same period According to data obtained by the NHSS in 2021 the unemployment rate in the state is reflected in the deterioration of the labor market amid the new Coronavirus Pandemic The Covid19 pandemic reversed the recovery trend in economic activities resulting in a significant drop in the growth rate in 2020 Related Works According to Wang et al 2018 studies have used travel time as a comparative measure to understand the imbalance or balance between employment and housing and racial economic and gender disparities in urban areas The authors delineated their research on two questions waiting time as an intermediate proxy for a measure of accessibility in Ubers travel service and considering Uber as a virtual transport infrastructure it raises the question of whether the company is related to sociospatial polarization in a neighborhood or more equitable access regardless of socioeconomic profiles The results by Wang et al 2018 indicated that for UberX the estimated average waiting time is around 3 to 10 minutes with a standard deviation of around 1 to 3 minutes For UberBlack the estimated average wait time is around 3 to 13 minutes with a standard deviation of around 1 to 3 minutes The authors point out that for UberX the average has a more concentrated distribution with a lower average value than the other service is not surprising as UberX is a more popular and costeffective service which likely results in more services of this type In the market In addition UberBlack presented a cost of at least three times higher per minute compared to the UberX service and four times higher in cost per mile Bezerra et al 2019 explored using estimated arrival time data from Uber ride requests as a simple indicator of urban habitually Due to its nature scale and coverage Uber provides objective data on the interaction between a citys inhabitants and its infrastructure mainly transport infrastructure In this way it is possible to compare data at multiple levels such as cities and neighborhoods but also provides contextsensitive data providing insights into the impact of other factors that affect Uber drivers and trips including traffic incidents weather and other events In this perspective Bezerra et al 2019 surveyed the possibility of using Uber data to provide a simple fast lowcost time and contextsensitive indicator of urban livability For this the authors considered the Uber Ride Request URR API for the Brazilian city of Natal In order to test the hypothesis that the pricing of Ubers travel services is related to the socioeconomic characteristics of the places these trips board Silva et al 2020 carried out a study for the same city of Natal To achieve this objective they collected data on trip prices of the UberX service type throughout 2018 in addition to socioeconomic data at the level of Human Development Units provided by the Atlas of Human Development in Brazil With the data obtained it was possible to build predictive models using Machine Learning techniques so that they could later be submitted to regression analysis The previous works sought to highlight relationships between information about Uber trips and socioeconomic characteristics in specific locations These relationships covered aspects of price and distance behavior when analyzed with social factors that could indicate an influence on the supply and demand of Ubers travel services The findings of these surveys indicated that some socioeconomic characteristics of places could influence the conception of prices Considering this conjuncture it is possible to analyze the behavior of these prices when taking into account travel routes between regions with different financial incomes when the destination of these travel requests is the financial center of some regions From this perspective the present study seeks to find indications that a nonconcentration of financial centers can contribute to reducing the costs of Ubers travel prices for users residing in economically poorer regions Analyzing other regions with other economic contexts could indicate that a concentration of commercial activities would influence Ubers trip pricing process PROPOSED METHOD The process adopted in this execution strategy was adjusted considering the context of the research which involves Socioeconomic Data and travel prices in transport by application The steps are Obtaining UberX Pricing Data Obtaining Socioeconomic Data Exploratory Data Analysis Cleaning and Processing Data Creating Predictive Models and Obtaining Results Figure 1 illustrates the flow of these steps Figure 1 Process flow used in the research Figure 1 Process flow used in the research Source Author 2022 The process flow proposed above shows that these steps can be used in different application contexts However for this study the Uber Price Data Collection and Data Cleaning and Processing steps were adapted to the reality of the research objectives In addition the additional step of Obtaining SocioEconomic Data is collecting social information related to a locality or region This stage has several dimensions Education Age Income Ethnicity Employment among others To obtain Uber Price Data a strategy was developed to create a database containing the prices of Uber trips to the city of Fortaleza These prices needed to be consistent with the reality practiced by the service Due to this research was carried out to obtain information supporting the logic of creating these prices On the other hand prices for trips to Boston did not need to be simulated since it was possible to find a real database that contained this information For the Data Cleaning and Treatment stage a statistical analysis was carried out in order to validate the database created for the city of Fortaleza In this sense it would be possible to support the simulated price values statistically This analysis was not performed for the city of Boston since the prices contained in its base came from real trips In addition for this step an analysis was performed in Graphs using Centrality Measures These measures helped analyze the impact of travel paths for different routes In this way it would be possible to determine whether a particular neighborhood would have a greater or lesser impact on the average price of trips if a trajectory were changed In this context five neighborhoods of each rich middle and poor classes were selected for the experiment in the city of Fortaleza The choice criterion took into account the financial part and the location In this sense the information on the Average Personal Income of residents aged ten years or older obtained in the BACKGROUND AND RELATED WORKS section was used Thus five representatives of each class were extracted considering the distances from the wealthy region commercial center encompassing geographic North and East of the city the median region geographical center of the city and from the peripheral region South and Southeast geographic of the city As found in the research the city of Fortaleza concentrates on the wealthy neighborhoods in the North and East that coincide with the commercial center and has residents with high per capita income The geographic center region of the city coincides with neighborhoods with intermediate per capita income and the peripheral region South and Southeast represents the majority of lowincome neighborhoods For the city of Boston some neighborhoods and regions belonging to neighborhoods were selected The choice took into account their geographical distance from the citys financial center The aim was to observe the behavior of high and low prices when travel requests came from neighborhoods or regions further away or close to this financial center With the results obtained from this analysis it would be possible to evidence a price trend similar to that observed in the city of Fortaleza The chosen neighborhoods and regions also took into account the ease of availability of information on per capita income in the neighborhood or region of the city The total number of neighborhoods and regions in the city was 12 units that were described in the BACKGROUND AND RELATED WORKS section Obtaining Pricing Data from Uber The prices analyzed were for the UberX service type This was determined due to this considerations As noted earlier UberX is the most popular service of the Uber company providing more affordable prices for the population This makes it possible for users of various income groups to access this type of service increasing the scope of service users The database of real trip prices obtained for the city of Boston contains information from the UberX service Because of this the simulations to create the Fortaleza database took into account the same type of service allowing for more homogeneity in the observations for both cities under study To determine the prices of the trips that served to compose the simulated base of Fortaleza the Uber price simulator and the research on particular city characteristics were used This way it was possible to generate price samples for trips between neighborhoods including trips to and from the same neighborhood In addition research has found evidence that travel prices vary on certain days and times In this sense when necessary prices were readjusted according to the times and days since the price tends to increase at peak times and on certain days Ubers pricing simulator includes policy rules in the companys pricing calculations taking into account location base rate dynamic rate and variations in supply and demand However to make the pricing logic more realistic it was necessary to consider information inherent to the locations under study Due to this through the research it was observed that Uber trips in the city occur with a division between neighborhoods of about 4 That is there is an approximate division of 4 trips for routes between poorrich pooraverage mediumrich and othe r possible routes between them T rips originating in middleincome neighborhoods are more likely to have the same neighborhood as a destination thus defining a higher percentage for these cases of about 48 of this happening That is 48 of trips tend to occur within the same neighborhood This value ensures that the largest amount of travel happens to the same region This percentage could be different as long as it is significant enough to establish that most travel takes place in the same region In the other alternatives the destination of a user in this neighborhood could be any other neighborhood in the city these values being defined in a portion of 16 of the total trips for the other possibilities 48 t o 64 64 to 80 80 to 96 As the price is related to the day and time it was also necessary to include a logic consistent with the citys reality In this sense the research found that Uber trips happen primarily on weekends and a more significant number of requests happen between 7 am to 2 pm and from 5 pm to 8 pm IPLANFOR 2015 G1 CE 2021 In parallel with the schematization of this information simulations were carried out with the companys price simulator for each day of the week and every 15 minutes a price sample was collected from all neighborhoods considered in this study This process was carried out for two weeks between the end of August and the beginning of September 2021 The experiment also considered a time without holidays and significant events in the city After that the arithmetic means for each route were calculated With the simulated values in hand obtaining a range of price values for each route under study was possible With the considerations above about the day of the week and the time of day it was possible to build a database more consistent with the reality of the city containing the prices of simulated trips and other attributes In total a sample size of 100000 travel prices was generated If the origin and destination are different then the price range will correspond to the other prices simulated by the Uber price simulator Although this logic is repeated for the other routes and price ranges it is necessary to consider the price adjustment factor which is a peculiar issue for each location Because of this a readjustment function called reajustepreço was implemented which is called when the price list corresponding to each travel route is created According to this research Fortaleza has peculiarities in the variations in the price of Uber trips depending on the time of day Due to this the adjustment logic considered an increase factor between R 100 and R 200 which are the average values found for this type of variation In the algorithm above the logic is based on checking the peak hour intervals that happen during a day for that city Depending on the value of the randomness seed the day will be a weekend or not and this will cause this readjustment function to be called when relevant Figure below illustrates a flowchart of these steps Figure 2 Flowchart for building the Fortaleza database Figure 2 Flowchart for building the Fortaleza database Source Author 2022 For the city of Boston prices are already registered through the real database provided by the Kaggle platform This base has 46 columns containing information on travel origin and destination weather month distance time day price and other information about a trip The base contains just over 600000 lines and refers to data collected for 2018 The prices of trips from the base of that city can provide another indication of price increases for travel routes destined for a regions financial centers In this way it is possible to substantiate through another economic context developed country region that there is an indication of high prices for Uber trips when the destination tends to be financial centers Obtaining Socioeconomic Data Uber does not provide socioeconomic information on users of its travel services In this sense an alternative means of obtaining this data would be using the characteristics of the place where requests for these services are requested This study primarily used information referring to peoples income by neighborhood In this way it would be possible to analyze the financial impacts for users who leave the neighborhoods they live in considering routes between neighborhoods of different classes In addition other general information from the study sites seen in BACKGROUND AND RELATED WORKS section was considered in the analyses In this context data from the Human Development Index HDI were used for the city of Fortaleza These data are based on the Brazilian Demographic Census in 2010 the last carried out at the neighborhood level For research purposes information on HDI R HDI B and average personal income of 5 neighborhoods of each class for the city of Fortaleza were used rich poor and average The geographic locations were chosen considering a division that evidenced distinctions in the incomes of the residents of these neighborhoods since for this city the further away from the commercial center a neighborhood is the lower its income Table 1 of the Background and Related Works section displays HDI R information and average personal income Information on the HDIs for all neighborhoods in the city can be obtained from FORTALEZA 2022 The data on this average personal income as they are from the year 2010 were converted to real values for 2021 This was done because the simulations of travel prices were carried out in this period This way it mitigates the differences in inflation interest rates and economic changes in this period as possible To achieve this objective the Citizen Calculator provided by the Central Bank of Brazil was used which allows the conversion of monetary values in Reais from one year to monetary values in Reais from another year BANCO CENTRAL DO BRASIL 2022 From this perspective for the conversion of the values by the calculator the Broad National Consumer Price Index IPCA for the year 2021 was considered since according to the literature it is the most used to perform the monetary correction The IPCA portrays the broader sco pe of inflation in the economy The socioeconomic information obtained for the city of Boston was restricted to data collected by the 2017 Boston Planning and Development Division BPDARD Research Agency This agency conducted a survey formalized in a document called Neighborhood Profiles In this way it was possible to extract some neighborhoods and regions of the city so that it was possible to obtain evidence of an upward or downward trend in prices considering the financial center of the city as a reference point Information in this context can be found in the Background and Related Works section Data Cleaning and Treatment According to Haughton et al 2003 Data Cleaning and Treatment steps are relevant as unprocessed data are usually nonuniform and unpredictable In addition data can drive up costs when used raw From this perspective for the city of Boston the missing data also called missing data were analyzed The analysis of these data made it possible to observe that they dealt with different origins that had the absence of values for specific attributes For this reason and because the total amount of missing data represents a small fraction of the entire database set it was decided to eliminate them On the other hand there were no missing data for the city of Fortaleza since the base was generated in a controlled environment through simulations research and mathematical calculations Another important point regarding data cleaning and treatment is the observation of Outliers points outside the curve as they can generate invalid data that do not portray reality These points present a significant numerical departure from the other observations In our context of UberX travel pricing data Outliers emerge as very high prices relative to the mean and median for the city of Boston In this sense there are statistical techniques such as the Z Score to detect these points This technique indicates the numerical distance between a point and the sample mean In this sense it is based on standard deviation and tries to mitigate the influence of data location and size This technique was chosen because the Boston database is in the order of hundreds of thousands of lines On the other hand the Fortaleza database does not present outliers since the simulations performed by the companys simulator provided ranges of values close to the averages obtained later Another essential point in this context of data treatment is the verification of the distribution of sample data Many statistical methods assume that the data follow a Normal Distribution However most of the data are not typical and violate some of these tests such as the TStudent test In this sense it was verified if the prices of the trips of the routes under study followed a Normal Distribution for the Fortaleza database The paths under study were poor rich poor medium and medium rich After checking each path a Normal Distribution was followed followed by the KolmogorovSmirnov Normality test for validation The results confirmed that travel prices for this simulated database did not follow a Normal Distribution Because of this the nonparametric Wilcoxon test was used The Wilcoxon test is commonly treated as a nonparametric version of the TStudent test for paired samples It tests whether the distribution of differences between two samples is symmetric and zerocentered In this sense the pairs considered were those mentioned above poor rich and poor medium paths Thus considering a confidence level of 95 and an Alpha significance level of 5 it was determined in the execution of the test that the alternative hypothesis rightsided test was verified that the average of the prices of the poor rich route is higher than the average poormedium route prices Thus it would be possible to verify if there would be statistically significant evidence that the average price of the poorrich route is higher than the average price of the pooraverage route In other words whether the groups differ statistically in a representative way As the literature states that generally data on the provision of services that involve monetary values are not predictable due to the very nature of the conception of these values for example transport trips by application the statistical tests were performed only for the Fortaleza database since which was a simulated base that needed this verification to confirm the reality of these types of data Due to this the Boston base was not statistically tested in this context not only for these reasons but also because it is a real base of prices of transport trips by the application that served as support in the foundation of the observations carried out for the city of Fortaleza Considering the weights of the databases under study and the fact that the Fortaleza database is simulated some measures of centrality for graphs were also treated for this database to ratify the reality of its data and the research carried out for this city In this sense firstly the data from the Fortaleza database were converted into a weighted graph in which the weight of the edges is the average prices of trips between the districts considered for the study The routes between these neighborhoods took into account the logic of creating the base seen earlier in this research The neighborhoods are recorded in Table 1 To convert this base into a graph it was necessary to use the networkx library which allowed manipulation at the level of nodes and edges and the attribution of the price as the weighting factor Furthermore this library made it possible to render the graph so that the paths between the neighborhoods under study were visible considering their respective weights This manipulation made it possible to use the Centrality Metrics for graphs The methods that allowed performing metrics calculations are also available in the networkx library The metrics used in this research were Degree Centrality Proximity Centrality and Intermediation Centrality The nxviz library was used to create the graphs with the results of the Centrality Metrics Creating Predictive Models As previously stated the database for the city of Boston is real That is a compilation of information about various UberX trips between the neighborhoods of this city The analyses on this basis support studies on UberX trips in Fortaleza to provide evidence that even in another context another country with different socioeconomic conditions climate and other aspects many pricerelated behaviors can be similar regardless of location From this perspective the Boston database has attributes for each trip such as time day month origin destination price distance and weather Considering that the this research focuses on travel prices an analysis was carried out using a Correlation Matrix to verify which pairs of attributes were more correlated Figure illustrates this matrix Figure 3 Correlation matrix for the Boston database Source Author 2021 This matrix shows that the closer to 1 the intersection between 2 attributes the more correlated they are Otherwise the closer to 1 the less correlated they are Therefore it is observed that the attributes price price and distance distance are the pair with the highest correlation among the others Due to this and the research found for the city of Fortaleza it was observed that price and distance are great influencers of the UberX travel service to the detriment of other attributes such as weather time and day Considering this situation and the analysis to verify the trend convergence for the prices of both cities new functionality was proposed for the Uber company This was raised because it was observed that the trips of this service on Uber offer trips with their prices already stipulated However in this market segment some companies adopt the inverse mechanism in which the user offers a price and the transport application returns the best distances for that offered price An example of this strategy is the company InDriver 2022 In this sense the creation of Machine Learning Models was proposed to predict the dependent variable distance based on the independent variable price and other independent variables in the Boston database In this way it would be possible to provide more freedom to the customer to choose the price of a trip and consequently enable a reduction in costs The Machine Learning algorithms used to create these models were chosen because they are widely used in the literature in the field of regression considering the data domain used in this research The algorithms are Simple Linear Regression SGD Regression Decision Tree and Random Forest The method for verifying the accuracy of the models was the R² Score The method used to standardize the base variables was Standard Scaler The method to perform the crossvalidation was the Cross Val Score The parameters used in the methods followed those recommended by the literature concerning the domain of the values involved and the size of the considered database In this sense for the Train Test Split method a Test Size of 20 and a Train Size of 80 were used EXPERIMENTS AND DISCUSSION OF RESULTS This section describes the results and discussions about the cities under study The results obtained for the distance prediction models for the city of Boston are also described and discussed as well as the behavior of prices considering changes in travel routes to the city of Fortaleza Also for this last city the results and discussions about the statistical analyzes and measures of centrality are described Finally a possible change in the prices of Ubers trips in Fortaleza is discussed if there was a greater distribution of its financial center among the neighborhoods Boston A scatter plot Figure was plotted to show price variability concerning travel distance In this way it would be possible to verify how much the Outliers were present in the base and the coherence between distance and price in the sense of following or not an increasing proportionality of values Figure 4 Dispersion between price and distance for Uber rides in Boston Source Author 2021 There are trips on Uber with longer distances and lower prices but there are also trips with shorter distances and higher prices The results previously observed by the Correlation Matrix help to understand this behavior Distance and price are the most correlated among the values of all pairs of base variables Figure below indicates average prices per request destination This graph shows that the prices of trips to Fenway Finacial District Boston University and Northeastern University tend to be more expensive compared to other neighborhoods Figure 5 Average prices by Travel Destination Source Author 2021 The Financial District is the citys financial center which may explain the higher value for money Boston and Northeastern Universities attract many students who use their shuttle services Because of this although these universities have a slightly higher price than other destination neighborhoods they have a lower average price than trips to the financial center This may indicate that the neighborhoods that are a little further away from the financial center may allow for a decrease in travel prices compared to the prices charged for the financial center even though these neighborhoods are a little further away from the financial center and have factors of attractions such as universities hotels museums and other highdemand locations After Exploratory Data Analysis and the weighting of the Correlation Matrix we proceeded to create Machine Learning models considering the distance variable as dependent on the others in the base As the manipulated data are of the continuous domain and are being treated by regressors the results of Determination Coefficients R² were obtained for each regressor For this the CrossValidation technique was used in which a number of folds equal to 5 was determined and the evaluation parameter was R² which refers to the coefficient mentioned above In this sense five values of R² were obtained for each regressor under study and their respective averages were calculated The results can be seen in Table 2 below Table 2 Results of the Average Coefficients of Determination TECHNIQUE R² average LINEAR REGRESSION 8 SGD REGRESSION 7 DECISION TREE 91 RANDOM FOREST 94 Source Author 2021 Random Forest is one of the most used Machine Learning algorithms in regressions of values on continuous domain datasets and classifications of binary values When this algorithm is used to combine the prediction of a set of Decision Trees to obtain a single answer as an output then it tends to present better performance than the one obtained with each tree of a model in isolation due to the possibility of variance reduction ALVARENGA 2018 In this sense it can be seen from the table above that an average R² of 94 was obtained for this algorithm the highest among the other techniques used This means that this model fits the data well and the predictor variables explain 94 of the predicted distance data Fortaleza One of the main factors determining the socioeconomic disparities between the studied neighborhoods of Fortaleza is the HDI Figure shows indices between the most developed and the least developed Figure 6 HDI of the neighborhoods under study for Fortaleza Source Author 2021 Similar to the HDI graph Figure can be seen which illustrates the differences in the average personal income of inhabitants aged ten years and over in each neighborhood under study Figure 7 HDI Average income of inhabitants of some neighborhoods in Fortaleza Source Author 2021 In Figure we can see the disparity in the average prices of trips according to origin Figure 8 Average price per neighborhood of origin of the service request Source Author 2021 It is noted that users from poorer and less developed neighborhoods in Fortaleza end up spending more on commuting via app transport than those departing from wealthier neighborhoods For example considering the poor neighborhood Canindezinho the average price of a user who requests a trip from that neighborhood as an origin is R 1723 the destination being any of the other neighborhoods under study Figure illustrates the average price spent according to the time of travel Note how much more advantageous it is to use the Uber service at night from 8 pm in Fortaleza Figure 9 Average price of trips per hour in Fortaleza Source Author 2021 Figure illustrates the average prices considering each class of the neighborhoods under study poor medium and rich Thus it is possible to analyze the behavior of prices from a more holistic view distinguishing travel routes by neighborhood class Figure 10 Average prices per route from poorer to richer neighborhoods Source Author 2022 It can be seen from the figure above that the route that involves the class of poor and rich neighborhoods has the highest average prices On the other hand the route between the middle and wealthy neighborhoods has the lowest average This corroborates the research for this city in that middleincome neighborhoods in addition to being geographically closer to the commercial center and the wealthiest neighborhoods have a lower cost when requesting Uber trips to the detriment of poor neighborhoods As a complement to Figure Figure was plotted below which illustrates the inverse path of the one shown above travel from wealthier neighborhoods to poorer neighborhoods Figure 11 Average prices per route between richer to poorer neighborhoods Source Author 2022 It can be seen from the graph above that the upward behavior of prices is repeated however with an increase in the averages This may show that the return of users from poorer neighborhoods usually pays even more expensive compared to oneway trips This can be related to security issues mobility and supply and demand After the Exploratory Data Analysis a statistical analysis was performed for this Fortaleza database The purpose of this analysis was to verify the validity of this simulated base in statistical terms so that it was possible to extract information more consistent with reality In this sense it was verified if the samples of the studied routes followed a Normal Distribution Each sample consisted of 1000 trips considering the desired route filter poor rich poor medium and medium rich The first path verified was the poor rich The value of p was smaller than that of alpha α This means that the null hypothesis cannot be rejected and this distribution is not Normal The other routes under study also showed the same behavior following a nonnormal distribution Table 3 contains the p values for the routes under study considering the alpha value assigned Table 3 P values for the routes under study TRAVEL PATH PVALUE α of 5 POOR TO middle 39E06 POOR TO RICH 27E55 middle TO RICH 13E06 Source Author 2022 As the pvalue was less than the alpha significance level the probability of getting data like this is tiny Thus it can be concluded that the values of travel prices do not follow a Normal Distribution In other words as the results obtained above showed that travel prices do not follow a Normal Distribution Wilcoxons NonParametric Test was used The pairs considered for this test were the poorrich and poormedium paths Considering a confidence level of 95 and an alpha significance level of 5 it was determined in the implementation to test the alternative hypothesis rightsided test that the average price of trips on the poorrich route was more significant than the average of the prices of the poormedium route As a result a lower pvalue than the alpha value was obtained showing statistically significant that the average price of travel from poor to rich is higher than the average price of travel from poor to medium In other words the groups statistically differ in a representative way and corroborate the histogram of average travel prices between rich poor and medium neighborhoods Then these samples were analyzed for some centrality measures for graphs This analysis was important in the sense of emphasizing how important a neighborhood can be in influencing the pricing of inapp transport trips In this sense the samples under study were first converted into graphtype objects After the conversion it was possible to plot the routes in terms of their travel connections with their respective average prices For a more consistent positioning with reality a graph of these neighborhoods was plotted on a map of the city of Fortaleza For this each neighborhoods latitude and longitude information was collected to maintain the reality of distances and proportionality This way it would be possible to visualize the distances between the neighborhoods geographically Figure illustrates this representation Figure 12 Graph of the neighborhoods under study on a map of Fortaleza Source Author 2022 From the figure above we can see the great distance between the poor neighborhoods in the Southwest part of the map and the wealthy neighborhoods in the Northeast portion In turn the average neighborhoods are represented by the nodes on the transverse line on the map After converting the samples into a graph it was possible to determine the centrality measures The measures used were Degree Centrality Proximity Centrality Closeness and Intermediation Centrality Betweenness Figure below indicates the Degree of Centrality for the data under analysis Figure 13 Degree Centrality for the data under analysis Source Author 2022 It can be seen from the graph above that the poorest neighborhoods under study Canindezinho Parque Presidente Vargas have a higher Degree of Centrality This means that these neighborhoods have a high frequency of travel requests and those that minimize the average price for a given route This considers the research context that weights the average prices on the edges and indicates that an important node in a graph is connected to many nodes In this way it shows that in this context a vital neighborhood takes into account the cost of the price of trip and not just the geographical issue The following measure analyzed was the Proximity Centrality Figure below illustrates its behavior Figure 14 Proximity Centrality for the data under analysis Source Author 2022 Again high values are noted for the poorest neighborhoods This is because many poor neighborhoods are close to each other in average prices That is considering the poor neighborhood of Canindezinho as an example this indicates that it is close to most of the poor neighborhoods analyzed In other words average closer to the others On the other hand Cajazeiras middleclass neighborhood is a neighborhood that presents a greater distance in average prices between the average neighborhoods analyzed In the context of this research this means that this proximity is not restricted to the geographic issue The poor neighborhoods considered because they are geographically close presented similar price averages so a high value of this measure for the poor neighborhoods and lower values for the middle and wealthy neighborhoods since they are more distant from each other In this sense it is essential to point out that the weight considered in the experiments was the average of prices which leads us to show that through travel prices it is also possible to indicate the proximity between places from a price perspective and not only from a price perspective geographic Finally the Centrality of Intermediation was analyzed Figure below illustrates its behavior Figure 15 Intermediation Centrality for the data under analysis Source Author 2022 The figure above shows values for middleclass neighborhoods such as Cajazeiras and Dendê in addition to other richclass neighborhoods such as Meireles This metric indicates the number of times a node acts as a bridge along a path As the middleclass neighborhoods are geographically located in the central portion of the city this shows how much they are a bridge between the poor neighborhoods in the South and Southwest territorial portion and the wealthy neighborhoods in the North and Northeast portions The presence of the wealthy Meireles neighborhood with a high value for this metric evidences its geographical proximity between the wealthy neighborhoods and the commercial center indicating that many routes pass through it It is important to note that as the weight was passed as a weighting in the graph and refers to the price of the trip then this means that a bridge district may be at a greater distance than another district considered a bridge given the possibility of a trip with a price shorter but with greater distance depending on the day time and demand of the transport service by application After analyzing the data from the Fortaleza database as well as the socioeconomic study for this city it was observed that there is a high concentration of income and the consequent geographic segregation of the population people with greater purchasing power occupy the area East of the city which concentrates the noblest portion of the neighborhoods and the commercial center and those with less purchasing power occupy the South and West zones In addition there was daily overcrowding of the public transport system by buses and congestion on the main roads A large part of this problem is explained by the concentration of most economic activities and consequently the more significant number of jobs in the North and Northeast regions In contrast most of the population resided in the West and South zones of the city hence the desire to travel long distances Considering this situation 2010 data were collected regarding the Average Monthly Income of people aged ten years and over by neighborhood However as these socioeconomic data are from 2010 it is essential to adjust their values to match better the reality of prices currently practiced in the market In this sense considering that the simulations of UberX trips in Fortaleza were carried out in the year 2021 a conversion of these income values was carried out The conversion of the Average Monthly Income values was carried out using the general IPCA inflation index for 2021 considering the weights seen in the BACKGROUND AND RELATED WORKS section After the conversion the values were as follows Conversion of values for 2021 Average Personal Income of all wealthy neighborhoods R 308931 Average Personal Income for all mediumsized neighborhoods R 122143 Average Personal Income for all poor neighborhoods R 79286 A survey carried out in 2019 by the National Association of Urban Transport Companies NTU which brings together more than 500 urban and metropolitan bus companies throughout Brazil shows that transport by apps attracts more people who habitually use only transport public than people who used their cars frequently DIÁRIO DO TRANSPORTE 2020 According to the survey more than 60 of app users came from public transport The survey focused on ten Brazilian capitals including Fortaleza The research was carried out through electronic interviews using questionnaires on social networks One thousand four hundred ten questionnaires were analyzed from October 16 to November 22 2019 It was observed that most respondents 52 use transport by app 2 to 4 times a week an average of 3 trips per week The survey showed that work is the main reason passengers use this service every day Sporadic or weekly users use the service mainly for leisure Considering that most people in the city of Fortaleza need to travel to the commercial center having as their origin primarily the peripheral neighborhoods and considering that these trips have education and work as their leading causes then for this research an average number of trips per week of 3 times was adopted for people who live in a lowincome neighborhood peripheral and for people who live in a highincome neighborhood close to the shopping center Considering that most months of the year have four weeks the amount of monthly travel would be 12 times on average In this sense considering transportation by Uber app the research showed that the average price of trips in lowincome poor neighborhoods between themselves R 1354 is lower compared to the average price of trips in neighborhoods poor to higherincome richer neighborhoods that is poormedium route R1657 and poorrich route R2888 A proposal to try to reduce these costs of Uber trips for users in the poorest neighborhoods would be a greater decentralization of the citys commerce encompassing more of the middle and poor neighborhoods since the city has a sizeable commercial concentration distributed in the city North and Northeast region From this perspective it was evidenced that the commercial center is around wealthy neighborhoods Users in poor neighborhoods spend an average of R 2888 one way as the user can return to their neighborhood by another means of transport to get to the shopping center If the number of trips to the shopping center of a user in a poor neighborhood is on average 12 times a month then it would give a monthly cost of R34656 This corresponds to about 4371 of the average personal income of a person residing in a lowincome neighborhood On the other hand users in wealthy neighborhoods spend an average of R 856 per trip to the commercial center a region that concentrates most of their territory within these same neighborhoods trips from wealthy to rich Also considering an average number of trips of 12 times a month it would give a monthly cost of R 10272 This corresponds to about 332 of the average personal income of a person residing in a highincome neighborhood This means that users in poor neighborhoods spend about three times more on Uber trips in Fortaleza than those who live in wealthy neighborhoods when the destination is the citys commercial center In addition users from poor neighborhoods spend an average of R 1657 on travel to mediumsized neighborhoods This means R 1244 compared to the average travel price from poor to rich This would give per month considering the same amount of 12 trips a value of R 14928 corresponding to 1882 of the average personal income of a lowincome person A cost reduction of 4307 compared to the poorrich route Users from wealthy neighborhoods spend an average of R1728 on travel to middleclass neighborhoods This means an increase of R 872 compared to the average price of trips from wealthy to rich R 856 This would give per month considering the same number of trips of 12 a value of R 20736 A cost increase of 1018 compared to the richrich path However this value would correspond to 671 of the income of people residing in these neighborhoods Users from poor neighborhoods traveling to middleclass neighborhoods would have an expense representing 1882 of their income That is although there is a reduction in costs for residents of poor neighborhoods the expenditure for this portion of the population in this context would still be about three times greater than that of people residing in wealthy neighborhoods but would have a much more significant impact lower when compared to the poorrich path that represented almost half of the average personal income 4307 Figure and Figure show respectively the values of the average prices average monthly cost and the value in percentage of the impact on Average Personal Income MPR of travel routes from poor to wealthy neighborhoods POOR RICH and of routes from wealthy to affluent neighborhoods RICO RICO for the current scenario in which there is a concentration of the commercial center in the city of Fortaleza and for a possible scenario in which the financial center was more comprehensive encompassing at least the middleclass neighborhoods Figure 16 Paths under study in a centralized shopping center setting Source Author 2022 Figure 17 Paths under study in a more decentralized shopping center setting Source Author 2022 If the shopping center is decentralized in a way that it is closer to at least the middle neighborhoods then there could be a travel cost reduction for users in poor neighborhoods of at least 4307 That would be a savings of around R14928 per month or R179136 per year As for users who live in wealthy neighborhoods the decentralization of commerce to mediumsized neighborhoods would raise travel costs to just over 100 For poor neighborhoods this monthly savings would represent 1882 of the average monthly personal income This increase would represent 671 of the average monthly personal income for wealthy neighborhoods This means that the impact of price increases for users in wealthy neighborhoods would continue to be smaller than that for poor neighborhoods when they leave their locations for a shopping center located more in middleclass neighborhoods However for the poorest neighborhoods there would be more significant savings compared to the current location of the commercial center and the income of the richest would be affected by less than 7 Considering the indicators found a possibility for decisionmaking would be for the Brazilian federal government to develop public policies to promote trade in financially poorer regions As evidence for the city of Boston some lowerincome regions can attract people through commercial activities such as bars sports stadiums parks and restaurants Although these areas of Boston lack the financial power of the citys commercial center they still manage to mitigate some of the high prices of transportation services in the context of Uber In this sense for the city of Fortaleza tax incentives could be proposed for entrepreneurs in low and middleincome regions promoting economic incentives in these regions and enabling a reduction in travel costs in transport by application Nevertheless the state government could work together with policies promoted by the federal government to increase the practical efficiency of these incentive measures In this way the financial impacts could be more likely to materialize reducing spending on the average monthly personal income of lowincome residents and enabling a reduction in the economic aspect in the context of social inequalities CONCLUSIONS In this work analyzes were made from a Data Science process that considered Socioeconomic data and travel prices for transport by the Uber app More precisely the behavior of prices was studied when compared with different travel routes of this company Thus it was possible to observe the impact of these prices on the Average Personal Income of users residing in low and highincome regions Based on these observations a decisionmaking process was proposed which consists of a policy of more significant tax incentives for Brazilian entrepreneurs in order to promote the trade in low and middleincome regions promoting the economic incentive of these regions enabling a reduction in travel costs in transport by application In this way a lack of financial concentration encompassing financially poorer regions could impact the reduction of prices of transport trips by the Uber app reducing the expenses of the Average Personal Income of financially poorer users without however impacting a high way the Income Personal Average of the wealthiest users This research can also contribute to guidelines in developing public policies in the context of social inequalities especially in the economic aspect The results reveal an indication of cost reduction for an economically poorer population if shopping centers are more distributed in a region However the results of these analyzes need further studies that can detail this behavior since these inequalities have multifaceted characteristics As future work the authors are interested in investigating other scenarios to validate the results already obtained One can also explore other transport services such as Moto Taxi and Public Transport to explore other characterizations of price changes and assess their representativeness There is also the possibility of improving the methodology adopted in this research inserting more statistical validations and other measures of centrality or other validation parameters In addition one can try to apply other Data Science approaches from other works on the data found here either for greater detail of Exploratory Data Analysis or to detect other patterns With this it would be possible to compare some results obtained by other studies with those already obtained here REFERENCES ALVARENGA JUNIOR Wagner José Hyperparametric Optimization Methods A Comparative Study Using Decision Trees and Random Forests in Binary Classification 2018 Dissertation Masters in Electrical Engineering Faculty of Engineering University of Minas Gerais Belo Horizonte 2018 AMARAL Fernando Introduction to Data Science Data Mining and Big Data 1 ed Rio de Janeiro Alta Books 2016 ANSELIN Luc Spatial econometrics methods and models In Studies in Operational Regional Science v 4 2013 Available at httpslinkspringercombook1010079789401577991aboutbookcontent Accessed on 25 jun 2022 AVRIM Blum HOPCROFT John KANNAN Ravi Foundations of DataScience Cambridge University Press 2018 432 p BANCO CENTRAL DO BRASIL Citizen Calculator Available at httpswwwbcbgovbracessoinformacaocalculadoradocidadao Accessed on 25 jun 2022 BEZERRA Aguinaldo et al The preliminary exploration of uber data as an indicator of urban liveability In International Conference on Cyber Situational Awareness Data Analytics and Assessment Cyber SA IEEE 2019 p 18 BOSTON PLANNING DEVELOPMENT AGENCY RESEARCH DIVISION BPDARD Neighborhood Profiles 2017 Available at bostonplansorgresearchmaps Accessed on 28 jun 2022 BORBA Elizandro Centrality Measures in Graphs and Applications in Data Networks 2013 Dissertation Masters in Applied Mathematics Institute of Mathematics Federal University of Rio Grande do Sul Porto Alegre 2013 Cortez P and Santos MF 2015 Recent advances on knowledge discovery and business intelligence Expert Systems 32 433 434 doi 101111exsy12087 DONGES Niklas Random Forest Algorithm A Complete Guide Builtin Jul 2021 Available at httpsbuiltincomdatasciencerandomforestalgorithm Accessed on 28 jun 2022 DUSI L A The use of smartphone applications in individual transport 99Taxis and Uber 2016 Course Completion Work Bachelor of Civil Engineering University of Brasília Brasilia 2016 G1 CEARÁ Traffic in Fortaleza is 68 slower at peak times says survey 2018 Available at httpsg1globocomcecearanoticiatransitodefortalezafica68maislentoemhorariosdepicodizpesquisaghtml Accessed at 25 June 2022 GOVERNMENT OF THE STATE OF CEARÁ Available at httpswwwcearagovbr Accessed on 25 jun 2022 HAUGHTON D et al 2003 Effect of dirty data on analysis results Eighth International Conference on Information Quality P 64 79 2003 InDriver Available at httpsindrivercomencity Accessed on 05082022 FORTALEZA PLANNING INSTITUTE IPLANFOR Fortaleza Mobility Plan PlanMob 2015 FORTALEZA PLANNING INSTITUTE IPLANFOR Mobility Available at httpscatalogodeservicosfortalezacegovbrcategoriamobilidade Accessed on 27 jun 2022 MELNIK M Demographic and Socioeconomic Trends in Boston What weve learned from the latest Census data Boston Redevelopment Authority v 29 Nov 2011 NHSS NATIONAL SURVEY BY SAMPLE OF HOUSEHOLDS Available at httpscidadesibgegovbrbrasilcefortalezapanorama Accessed on 13 Nov 2021 MUNICIPAL CITY HALL OF FORTALEZA Human Development by neighborhood in Fortaleza Available at httpswwwfortalezacegovbrnoticiasprefeituraapresentaestudosobredevelopmenthumanoporbairro Accessed on 25 jun 2022 QUICK Bruno Ideas and Business transport by application Sebrae 2020 Available at httpsbibliotecassebraecombrchronusARQUIVOSCHRONUSIDEIASDENEGOCIOPDFS510pdf Accessed on 25 jun 2022 Sadouk L Gadi T Essoufi EH A novel costsensitive algorithm and new evaluation strategies for regression in imbalanced domains Expert Systems 2021 38e12680 httpsdoiorg101111exsy12680 SILVA J LIMA L BEZER I A methodology oriented to sociodemographic data for predicting Uber X prices In Congresso Brasileiro de AutomáticaCBA v 2 n 1 Dec 2020 SILVEIRA Daniel Income inequality grows in the Northeast and decreases in other regions points out IBGE G1 Rio de Janeiro 2020 Song I Y and Zhu Y 2016 Big data and data science what should we teach Expert Systems 33 364 373 doi 101111exsy12130 TRANSPORT DIARY Available at httpsdiariodotransportecombr20200130maisde60dosusuariosdosaplicativosvieramdotransportepublicoeprecoestaentreos mainreasonsforexchange Accessed on 07 Feb 2022 UBER Available at httpswwwubercomglobalptbrpriceestimate Accessed on 10132021 WANG Mingshu MU Lan Spatial disparities of Uber accessibility An exploratory analysis in Atlanta USA Computers Environment and Urban Systems v 67 p 169175 2018 3 1 SCENARIO WITH CENTRALIZED FINANCIAL CENTER SCENARIO WITH DECENTRALIZED FINANCIAL CENTER

Sua Nova Sala de Aula

Sua Nova Sala de Aula

Empresa

Central de ajuda Contato Blog

Legal

Termos de uso Política de privacidade Política de cookies Código de honra

Baixe o app

4,8
(35.000 avaliações)
© 2025 Meu Guru®