What is Arlington 2050?

The Arlington 2050 project is a yearlong initiative that strived to get community feedback on what Arlington should look like in 2050 and the challenges we must address to get there. I worked specifically with the 'Postcard test tracker dataset'. This specific dataset had data collected from many locations. These ranged from local libraries to community events. Our data was collected strictly on postcards at said locations, whether that be in person or dropped in a box.

Why did we analyze data for them? What did it do?

I believe we analyzed data for them as it was a way to help our community but also a way to work with real world data. We had worked with random data before, but never analyzed and produced results for real world data. We produced many things, some of our most prominent had to be word clouds and histogram graphs. These were the most insightful and also easiest to understand. We also did semantic search using Word2Vec which helped us understand what people correlated certain things with (ex: "missing middle" and "property").

Aspects of the Project!

Cleaning Data:

I used pandas for this process.

First, I took into account the fact we had hundreds of entries from multiple different locations. Due to the fact I had entries from multiple different locations in the county I had a little extra work when it came to cleaning the data. I needed to make sure that each location had its own dataframe, this would make getting specifics on certain locations easier. I achieved this by creating a new dataframe for each location.

Creating Word Clouds:

I used Spacy, Word Cloud, and Matplotlib for this process.

Some things we noticed in our word clouds was that every location seemed to have different common words. While some would be similar, they would never be perfectly the same. Some of the most common words ranged from "Arlington", "Community", "Housing" to "Trees", "Green", and "Safe". All word clouds had their similarities, but their differences stuck out more. It was super interesting how one part of arlington focused on community while the other focused on trees and green!

Sentiment Analysis

I used Spacy and SpacyTextBlob for this process.

I was able to see that while you'd assume that most entries would be positive, not all are. With one set of data having a polarity of 0.2850, it shows us that it is fairly positive with some negativity (it ranges from -1 to 1). Though with subjectivity we're able to see that the entries are partially based on opinion. Subjectivity tells us whether something is opinion or not, ranging from 0 to 1 (objective to subjective).

Semantic Search:

I used Spacy and Word2Vec

For this, I took two user-input phrases and uses spaCy's language model to measure their similarity to any query I predetermined, then I printed the three most similar phrases from a dataset. It leverages NLP-based semantic comparison to rank and retrieve relevant text entries.

Step 5: Done!

We've analyzed data in multiple ways! If you want a more in depth look at the code, go to my markdown file!