---
title: "Tweets"
author: "Prof. Eric A. Suess"
---

#### Alternatively, ask ChatGPT to write 10 positive and 10 negative Tweets related to a topic of interest to you.

I used [chatGPT](https://chat.openai.com/) to generate 10 positive and 10 negative Tweets and to output the Tweets to a tibble data frame. Using the following prompts.

#### Prompt 1

> Write 10 positive and 10 negative tweets for posting on Twitter about cats. Each tweet should include at least two sentence. No emojis. Output the results to an R Tidyverse tibble dataframe format.

```{r}
library(pacman)
p_load("tidyverse", "tidytext", "magrittr", "sentimentr", "ggwordcloud")
```


```{r}
# Create a tibble dataframe with positive and negative tweets about cats
tweets <- tibble(
  tweet = c(
    # 10 positive tweets
    "Just adopted the cutest little kitten today! Can't wait for all the cuddles and playtime! #catlover",
    "My cat is my best friend. Always there to comfort me when I'm feeling down. #felinefriends",
    "Cats are the masters of relaxation. Watching my cat nap is so soothing. #catlife",
    "My cat is the purr-fect companion. Loving the snuggles and purrs! #catlove",
    "Nothing beats coming home to a happy and purring cat. #petlove",
    "I'm constantly amazed by my cat's agility and playfulness. So much joy in one little furball! #felineantics",
    "My cat's purr is like music to my ears. #happycatowner",
    "Cats have a special way of making a house feel like a home. #furryfamily",
    "My cat's playful antics always bring a smile to my face. #felinefun",
    "I'm grateful for the unconditional love and companionship my cat provides. #catsofTwitter",

    # 10 negative tweets
    "My cat knocked over a vase and made a mess. Not so cute right now. #catshenanigans",
    "My cat scratched up my favorite chair. Frustrated and upset. #catproblems",
    "Cleaning up furballs is not my idea of fun. #catmess",
    "Why do cats always choose to sit on my keyboard when I'm working? #catinterrupted",
    "My cat woke me up at 4 am demanding food. #catnightowl",
    "Dealing with cat hair on my clothes is a constant struggle. #catownerproblems",
    "My cat destroyed the blinds again. Feeling annoyed. #catdamage",
    "Cat hair everywhere! It's a never-ending battle to keep my home clean. #catstruggles",
    "My cat hissed at me for no apparent reason. Feeling rejected. #catattitude",
    "Trying to train my cat to stop scratching the furniture. Not going well. #cattrainingwoes"
  ),
  sentiment = c(
    # Label the sentiment of each tweet (positive or negative)
    rep("positive", 10),
    rep("negative", 10)
  )
)

tweets
```

```{r}
tweets %>% slice(18)
```

```{r}
tweets_df <- tweets %>% mutate(line = row_number(),
                  hashtag = str_detect(tweet, "#"),
                  semitag = str_replace(tweet, "#", ";"),
                  hash = str_extract(tweet, regex("#[a-z]+")),
                  hash_cat = str_extract(hash, "cat"),
                  hash_other = str_remove(hash, regex("[#|#cat|cat]+")),
                  tweet = str_remove(tweet, regex("#[a-z]+")),
                  tweet_len = str_length(tweet))
tweets_df
```


A nice reference for additions to our book is [Notes on Text Mining](https://bookdown.org/Maxine/tidy-text-mining/) by Qiushi Yan.
See Chapter 2 Section 5 about [wordclouds](https://bookdown.org/Maxine/tidy-text-mining/wordclouds.html).

```{r}
library(ggwordcloud)

tweets_df %>% unnest_tokens(word, tweet) %>% 
  anti_join(stop_words) %>% 
  count(word, sort = T) %>% 
  ggplot(aes(label = word, size = n)) + 
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 15)

tweets_df %>% select(line, sentiment, tweet) %>% 
  unnest_tokens(word, tweet) %>%
  anti_join(stop_words) %>% 
  count(sentiment, word, sort = T) %>%
  ggplot(aes(label = word, size = n, color = sentiment)) + 
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 15)
```


```{r}
tweets_df %>% select(tweet) %>% 
  get_sentences() %$% 
  sentiment_by(tweet, list(element_id))
```


```{r}
library(tm)

tweet_corpus <- tweets_df %>% select(tweet) %$% 
  VCorpus(VectorSource(tweet))
print(tweet_corpus)
inspect(tweet_corpus[1:2])
as.character(tweet_corpus[[1]])
  
tweet_corpus_clean <- tweet_corpus %>% tm_map(content_transformer(tolower))
print(tweet_corpus_clean)
inspect(tweet_corpus_clean[1:2])
as.character(tweet_corpus_clean[[1]])

tweet_corpus_clean <- tweet_corpus_clean %>% tm_map(removeNumbers)
print(tweet_corpus_clean)
inspect(tweet_corpus_clean[1:2])
as.character(tweet_corpus_clean[[1]])

tweet_corpus_clean <- tweet_corpus_clean %>% tm_map(removeWords, stopwords())
print(tweet_corpus_clean)
inspect(tweet_corpus_clean[1:2])
as.character(tweet_corpus_clean[[1]])

tweet_corpus_clean <- tweet_corpus_clean %>% tm_map(removePunctuation)
print(tweet_corpus_clean)
inspect(tweet_corpus_clean[1:2])
as.character(tweet_corpus_clean[[1]])

tweet_corpus_clean <- tweet_corpus_clean %>% tm_map(stripWhitespace)
print(tweet_corpus_clean)
inspect(tweet_corpus_clean[1:2])
as.character(tweet_corpus_clean[[1]])

tweet_dtm <- DocumentTermMatrix(tweet_corpus_clean)
print(tweet_dtm)

# alternative solution: create a document-term 
# sparse matrix directly from the SMS corpus

tweet_dtm2 <- DocumentTermMatrix(tweet_corpus_clean, control = list(
  tolower = TRUE,
  removeNumbers = TRUE,
  stopwords = TRUE,
  removePunctuation = TRUE,
  stemming = FALSE
))
print(tweet_dtm2)

```


```{r}
# word cloud visualization
library(wordcloud)

wordcloud(tweet_corpus_clean, min.freq = 50, random.order = FALSE)
```

```{r}
library(quanteda)

tweet_corpus2 <- tweets_df %$% corpus(tweet)
summary(tweet_corpus2)

docvars(tweet_corpus2, "sentiment") <- tweets_df$sentiment
summary(tweet_corpus2)

tokens(tweet_corpus2) %>% dfm()
```

#### Prompt 2

> Write 10 positive and 10 negative tweets for posting on Twitter about cats. Each tweet must have at least two sentences. No emojis. Output the results to an R Tidyverse tibble dataframe format. Include the sentiment and the tweet number in the dataframe.

```{r}
# Create a tibble for positive tweets
positive_tweets <- tibble(
  tweet_number = 1:10,
  sentiment = "Positive",
  tweet = c(
    "Cats are the best pets ever! They bring so much joy and comfort to our lives.",
    "I love how my cat purrs and snuggles with me. It's the best feeling!",
    "Cats are so independent and low-maintenance. Perfect for busy lifestyles.",
    "My cat always knows how to brighten my day with their playful antics!",
    "Cats are amazing hunters and keep my home free from pests. Thank you, kitty!",
    "Nothing beats coming home to a furry friend who is always happy to see you.",
    "Cats are incredibly graceful and elegant creatures. I'm in awe of their beauty.",
    "My cat is my best buddy and confidante. I can always count on them to listen.",
    "Cats are the masters of relaxation. Watching them nap is so calming and soothing.",
    "I'm so grateful to have a cat in my life. They bring so much happiness and love."
  )
)

# Create a tibble for negative tweets
negative_tweets <- tibble(
  tweet_number = 11:20,
  sentiment = "Negative",
  tweet = c(
    "Cats are so aloof and unaffectionate. They don't seem to care about their humans.",
    "I'm allergic to cats and can't stand their dander. It's always a sneezefest around them.",
    "Cats scratch up my furniture and leave hair everywhere. Such a mess!",
    "Cats are so picky with their food. They turn up their noses at everything I give them.",
    "I've had multiple bad experiences with cats scratching or biting me. Not a fan.",
    "Cats are notorious for knocking things off shelves and countertops. It's so annoying!",
    "Cats can be moody and unpredictable. You never know when they'll lash out.",
    "I've had to deal with litter box odor and cleaning up cat messes. Not fun at all.",
    "Cats are notorious for waking me up in the middle of the night with their meowing.",
    "I've had furniture destroyed by cats scratching and marking their territory. Not cool."
  )
)

# Combine the positive and negative tweets into one tibble dataframe
tweets <- bind_rows(positive_tweets, negative_tweets)

# Print the tibble dataframe
print(tweets)

tweets %>% get_sentences() %$% 
  sentiment_by(tweet, list(element_id))
```

```{r}
library(tokenizers)

tokenize_words(mobydick)

tokenize_words(tweets["tweet"])

tweets_token <- tokenize_regex(tweets, pattern = "\\s+", simplify = FALSE) 
```