Aspects of the Project!
Cleaning Data:
I used pandas for this process.
First, I took into account the fact we had hundreds of entries from multiple different locations. Due to the fact I had entries from multiple different locations in the county I had a little extra work when it came to cleaning the data. I needed to make sure that each location had its own dataframe, this would make getting specifics on certain locations easier. I achieved this by creating a new dataframe for each location.
Creating Word Clouds:
I used Spacy, Word Cloud, and Matplotlib for this process.
Some things we noticed in our word clouds was that every location seemed to have different common words. While some would be similar, they would never be perfectly the same. Some of the most common words ranged from "Arlington", "Community", "Housing" to "Trees", "Green", and "Safe". All word clouds had their similarities, but their differences stuck out more. It was super interesting how one part of arlington focused on community while the other focused on trees and green!
Sentiment Analysis
I used Spacy and SpacyTextBlob for this process.
I was able to see that while you'd assume that most entries would be positive, not all are. With one set of data having a polarity of 0.2850, it shows us that it is fairly positive with some negativity (it ranges from -1 to 1). Though with subjectivity we're able to see that the entries are partially based on opinion. Subjectivity tells us whether something is opinion or not, ranging from 0 to 1 (objective to subjective).
Semantic Search:
I used Spacy and Word2Vec
For this, I took two user-input phrases and uses spaCy's language model to measure their similarity to any query I predetermined, then I printed the three most similar phrases from a dataset. It leverages NLP-based semantic comparison to rank and retrieve relevant text entries.
Step 5: Done!
We've analyzed data in multiple ways! If you want a more in depth look at the code, go to my markdown file!