top of page
Technical Projects
Search

Instacart Analysis

Updated: Oct 26, 2018


My goal in this project was to find what could be done to decrease the time between orders, one aspect of which would be determining what products will appear on a users landing page. I processed and assembled data in SQL from the Instacart Online Grocery Shopping Dataset from 2017 and made complicated calculations to enrich the data. Then imported the data into Tableau to create visualizations and run some statistics.


I started with cleaning the data - getting the columns set to the right data type and inserting foreign keys where they were missing. I calculated the date of each order working backwards from each user's most recent order date and the days since their previous order. I then joined all 8 tables in SQL to create one massive overview table that contained all of the information so I could export it to Tableau to finish my analysis, make data visualizations, and build a statistical model.


I searched through my data trying to find any significant trend. The two visualizations below are the some of the most interesting finds dealing with the metric days since prior order.

Each point on the plot represents an individual order. Unfortunately, this has an R-squared value of 0.0305, which renders this correlation useless.

Each point on this plot also represents a single order. This trend would be very interesting, but it also has a low R-squared value, coming in at 0.0361.

Because I was seeking to determine how many of the products shown to a user on their landing page should be products they've ordered before, I decided to take the plot above to the next level by breaking it into quadrants.


The lower left quadrant of this plot had the highest R-squared value, but was still less than the original plot, coming in at 0.0174.

Overall, there weren't any significant trends in the data with the limited information I had. I believe we need more data to find strong enough correlations to drive what products show on each users landing page and to know what factors actually decrease the days between orders. Until we have more meaningful data, it doesn’t seem like we have the grounds to spend a lot of time digging into this question.

As a general recommendation - not backed by data - it seems to make sense to keep produce a high priority on the user landing page if that particular user has bought produce on Instacart before, as produce spoils relatively quickly. I would also recommend that the products shown to each user have somewhere from 60%-80% products that they have purchased before. That way, maintaining what they keep coming back for while still introducing them to new products with each visit so they can expand the list of things they keep coming back for.


Comentarios


Los comentarios se han desactivado.
bottom of page