Ridesharing my way — Uber

jeh lokhande
Towards Data Science
6 min readJan 19, 2019

--

2 continents

7 cities

$ 1190 spent = ~ INR 90000

> 1400 miles travelled

That’s more than half-way from Atlanta to San-Francisco (where I’m going to be, starting next week)

Uber launched their operations in India in 2013 and I started using it aggressively when I moved to Bangalore, for work, in 2015. Since then the app has never left my phone.

This blog is about analyzing usage patterns and answering questions such as;

Should I book a pool or an uber x at 5PM?

Are wait times significantly more for a shared cab?

The post might seem long, but I assure you it’s a light read!

Overview

A large chunk of my rides taken are in Bangalore, where I spent $ 0.7 per mile on an average, which seemed high till I moved to Atlanta! $ 4.3 per mile!

The sample size for San-Francisco, Chicago, and Baltimore aren’t large and hence I will not delve into deeper analysis or draw any conclusions from those cities.

Days spent travelling in an Uber: 6.62
Days spent dealing with Uber (wait time + travel time): 9.16
Ride time efficiency: 72.24%

I started by looking at a timeline of my rides

Ride History Timeline

There’s a sharp drop in rides since joining Tech. Well, I stayed pretty close to campus. So that’s one. Also, being a student I preferred not to take an Uber, and instead, take the bus (to save money (I thought))

But the cost remained the same, thanks to the high cost per mile!

Well, this is just Uber. Ola and Lyft would add a lot more to this. Fortunately I did not get my data from them.

Bangalore

Usage in Bangalore established some really interesting patterns.

Monday evenings were for football. I uber-ed to the field around 6 PM and then back home at 8 PM and hence the dark spots.

Friday’s were for Arbor and Toit. (The slightly dark spots late Friday evening!)

But that’s about Bangalore.

USA

Uber only provides you with the trip begin and end coordinates. I calculated the haversine distance between the coordinates. This provided me with a lower bound estimate for the ride distance.

Haversine distance is basically euclidean distance but on a sphere. It takes into consideration the latitude and longitude to calculate the straight line distance between 2 points along the surface

Although I don’t have as many rides in the US as India, I still have enough to compare Shared rides vs Normal rides. Shared rides include Uber Pool and ExpressPool whereas Normal rides are everything else.

The dependence on cars in the US was glaring when I moved here. Being a stickler for green tech and reduction in car-use I’ve always advocated the use of Pool over X, XL, Premier.

As shared algorithms were improved over the years, I have increased my usage of shared cabs. I bet 2019 is way more than 50 %!

However is there a time-cost involved with taking a shared ride?

Cab sharing became popular during World War 2, in the US, to save costs. The second wave of cab sharing was during the 1973 oil crisis.

The question that pops to mind is “How can shared cabs ever have a lower total interaction time than normal cabs?”

The hypothesis is perfectly valid till the demand is lower than or equal to the supply. However, when the demand exceeds the supply by a certain number, it’s faster to share cabs than take your own private one.

Statistical Analysis

Some terminology;

Short trips — Trips where the haversine distance is less than 2.5 miles

Long trips — Trips where the haversine distance is more than 2.5 miles

Day trips — Trips between 9 am and 7 pm

Night trips — Trips between 7 pm and 9 am

Wait times in India have a pretty high variability as compared to the US, with my highest wait time being 44.97 minutes. This trip was from MG Road to the Airport in Bangalore, at 6 pm and lasted 72.58 minutes.

Moving onto data for only Atlanta

My data for cabs in Atlanta wasn’t sufficient so I asked one of my friends to share their data.

Are wait times and total interaction times different for shared and normal cabs?

I initially used non-parametric tests (Mann Whitney U test) to check for statistical significance since the number of samples I had was low.

Additional data allowed me to use the widely used unpaired t-test. (Thank you Central Limit Theorem)

a. Short Rides — Haversine distance less than 2.5 miles

The p-value for the test is 0.03 implying that wait times are statistically different for shared rides vs normal rides.

However, the t-test for the total interaction times for shared and normal cabs has a p-value of 0.2.

Hence, although wait times for shorter distances were different, the total time taken from booking the cab to getting to your destination is not statistically significantly different.

Furthermore, the difference in fares is also statistically significant, strengthening the case for pooling over uber x.

b. Decently Long Rides — Haversine distance more than 2.5 miles but less than 7.5 miles

Longer rides are a little different. The difference in wait time for longer rides is again statistically significant with a p-value of 0.02.

The difference in ride time + wait time, is also statistically significant with a p-value of 0.05.

As expected, fares are also different with statistical significance, between shared and normal rides.

The mean difference in fares for decently long rides between shared and normal cabs is $ 7.16

So if you are in a hurry to get to the airport, and willing to shell out some extra bills, take an X!

Should you choose a different option based on the time of the day?

Wait times are a big factor in choosing either a shared cab or a normal one. And moreover, we weigh that a lot more at night. Statistical tests fail to reject the hypothesis that wait times for shared cabs are different from normal ones at night (between 10 PM and 7 AM) for trips less than 5 miles.

The below interactive viz, shows you how the different times and fares (mean values) change as you change the travel distance.

Subtly hinting where I’d be working next month onwards.

E-Scooters

What if we replaced the rides that had a haversine distance of less than 2 miles with a scooter?

Number of eligible rides in Atlanta: 127

Average e-scooter speed: 7.5 miles per hour

Since I considered haversine distance as the distance between trip begin and end points, I multiplied it by 1.5 to get the approximate Manhattan distance between the points.

The savings average around 30.79% (apart from environmental benefits).

Average interaction time of taking a rideshare for short distances:  13.55 mins
Average interaction time of taking a scooter for short distances: 12.41 mins

Ok, not so subtle.

Feedback and criticism appreciated!

All the analysis is based on my data. Results will vary. Please do not draw any generalized conclusions.

Github: https://github.com/jehlokhande93/UberDataAnalysis

--

--