Thursday, October 31, 2013

11-7 Michael Speriosu et al. 2011. Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph.

31 comments:

  1. 1 - I'm curious about the real-life implications of mining Twitter for demographic research, primarily in terms of advertising: is Twitter a large enough demographic that one could tailor public sentiment in order to avoid, or mitigate, a PR disaster? I'm mostly curious if Twitter represents a vocal few when it comes to lashing out (eg, memes like #nohomo or the whole "bitch make me a sandwich" phenomenon) or whether people say, threatening to boycott Ender's Game or Macy's really are a segment of the population with financial power that could represent serious revenue loss.
    2 - In terms of dealing with "noise" in twitter, how do we accommodate for things like autocorrect screw ups or accidentally using the wrong emoticon? I'm also curious how we can work to interpret noisy tweets, as sometimes I'll read tweets that seem barely comprehensible due to all of the linguistic ticks that are in them.

    ReplyDelete
  2. 1. How do the studies of Twitter compensate for sarcasm, i.e. the polarities are actually the opposite of how they would be read by text-mining software? It seems that the trend-based web sites, especially social media, are very fast-evolving in terms of language usage (abbreviations, slang, etc.), are often also spaces of ironic language use, sarcasm, hyperbole, etc.

    2. I'm a Twitter user, and while I am well aware of the data mining done for marketing purposes, I'm kind of surprised tweets are being studied academically on such a micro-level, since most of the content is of a throwaway nature, except perhaps insofar as it reflects evidence of interest to linguists (slang, again; usage evolution). The self-selecting nature of the sample group is also problematic for me; Twitter users (especially "high volume" tweeters and those who regularly use the platform for political activism, such as those who would be commenting heavily on presidential debates) do not reflect a broad social sample.

    ReplyDelete
  3. 1. Are there weight values among the top 20 most predictive unigrams? For example, “glad” can be a much stronger indicator than “moon” for positivity. And why would “moon” be chosen as the predictive unigram for positive tweets?

    2. The authors mark the beginning and end of tweets by ‘$’ in bigram features. Why would the beginning and end of the tweet are treated special?


    3. What kind of nuances of languages can the authors’ approach handle? Can it handle sarcasm tweets, which have the predictive unigrams or bigrams for positivity, but actually have a negative underlying tone?

    ReplyDelete
  4. 1. Overall, it seems like certain classification methods work better on some data sets than on others. Is this to say that there is not true way to test polarity encompassing all of twitter, and that the polarity results are just dependent on the data sets?

    2. How would you test irregularities in the system? Something that doesn't meet the criteria and doesn't fall into one category or the other. Is it possible to tweet and then classify indifference?

    3. I found it interesting that emoticons were used as 'noisy indicators of polarity'. The amount of emoticons that some people use is astounding and some are even, frankly, baffling. For example, %-{. What is that? Some are not clear indicators of polar feelings, like :o. How can the outlying emoticons and hash tags and whatnot, play an accurate role in the training and gauging emotion? Or is it that they, simply, don't?

    ReplyDelete
  5. 1. The authors discussed the use of the Lexicon-based baseline, which identifies polarized tweets based on “positive” and “negative” words found in the OpinionFinder subjectivity lexicon (56). However, while there is a large amount of words in this lexicon, how does it keep up with emerging slang? Furthermore, what about community-specific positive and negative words? By using this approach, are all the polarized tweets being identified?

    2. The authors discussed collecting tweets for their datasets, and also mentioned other studies that collected tweets (54-6). What are the privacy regulations regarding this? The authors mention that tweets are frequently used to gauge public opinion/ customer input – but are these only on tweets that are visible to everyone? If so, are researchers getting a limited view – only the opinion of those who want everyone to see their thoughts; do they accurately represent the public? If not, are privacy regulations being abandoned for these studies and how would Twitter users feel about this?

    3. We talked in class about metadata specifically as it relates to Twitter – that people are using hashtags to be clever or add a punch-line, etc., and not really as a way to organize their tweets with others’. Is the “misuse” of hashtags a consideration when “mining” tweets? Does it/will it throw off efforts to obtain polarized tweets? Or are hashtags ignored?

    ReplyDelete
  6. 1. I’m curious how researchers assess when the accuracy rate for prediction reaches a high enough level where a certain amount of error would be acceptable. It seems that with the abundance of actual tweets that 100 percent accuracy would be less of a concern than predicting an overall trend.

    2. While this type of research seems beneficial for gathering a generalized public opinion I don’t see it offering very much insight or feedback to companies who are looking for detailed information about why users like or dislike their services or products. I would be interested to see how hyperlinks could be utilized to increase the complexity of data being gathered?

    3. I’m curious if anyone has attempted to use Twitter data to unmask influential members in particular fields as a way of understanding trends in an isolated discipline? While I can understand while it might be beneficial to track all users at what point does a tweet matter less? It seems that a lot of context is lost when the popularity and influence of particular users isn't taken into account when trying to measure these things.

    ReplyDelete
    Replies
    1. 1 - You raise a good point -- the applications that Speriosu et al. propose are largely commercial, and general trends are usually far more important to e.g. advertisers than individual opinions. Though from observation of corporate Twitter feeds, it does seem that some companies are using advanced search algorithms to find individual customer complaints so that they can publicly respond -- but these complaining tweets are evaluated and answered by a live person before the company acts on them, rather than necessarily being taken as indicative of a trend.

      Delete
  7. 1 How does the author decide whether he would choose the tweets? Since Twitter is a platform that people may let off steam, sometimes even more emotionally than normal time. Is that could effect the result? And sometimes people say something they do not mean that especially on social media, how could solve this?

    2 Are those predictive unigrams have same value? And how about people have different comprehensive of one particular word? How about we assign weight for each word? Would that be work?

    3 How about those tweets that do not match the standard? How does the author solve the problem of precision within the context of whole network? Is that can be representative?

    ReplyDelete
  8. 1. In the conclusion of the article, the authors suggest that further research could be done to indentify polarity of tweets based on the polarity of that content it might linked to. When thinking about the types of content that is shared on my facebook feed, I think that a lot of people share articles with viewpoints that they think are outrageous, so there actually words might be sarcastic, but I know them well enough to understand their meaning. How would this affect such polarity algorithms?

    2. The Obama-McCain debate data set was evaluated for polarity by Mechanical Turk workers. I am questioning whether this is a good source to use to evaluate polarity. Each tweet was evaluated more than one time, but in the end only 58% of the tweets were used as they had a clear polarity. Based on the subjectivity of relevance that we have talked about, is it appropriate to have different people evaluated the polarity within the same dataset? I would also think that the polarity could have been judged by international workers who may not be able to read English well enough to pick up the subtleties of sarcasm or colloquial words to evaluate polarity.

    3. The polarity algorithms judge positive and negative tweets on specific emoticons used. Often times on social media such as instagram or facebook, people post pictures of themselves smiling or frowning to get their feeling across. Would it ever be possible for face recognition software to be able to evaluate polarity of text based on an attached photo?

    ReplyDelete
  9. 1. This is the most interesting paper that I read in this semester. The first question is about the application of tweet analysis in the future. What could we do with the tech of analyzing tweets to achieve our business goals? Or say, what's the hidden value of getting the emotional information from the community of Twitter?

    2. I'm not familiar with legal things but is it ok to collect the data from tweets without the permission from the Twitter's users?

    3. It seemed the authors of the paper value every word with the same weight, while this might not be the best way. How could we better weight focus words in tweets so as to prompt the results of the research to match the real situation?

    ReplyDelete
  10. The training set involving emoticons seems to be weighted heavily in favor of positive attitude. How can emoticons be reasonably used for training or study, when determining overall positivity or negativity, given the number of positive emoticons? Also, did the emoticons involve smartphone emoticons which can be displayed as small images instead of text or did the API recognize these emoticons as their text equivalents?

    In the conclusion section, the article makes note of text links within tweets possibly being used to help determining results. If tweets involving subjects such as politics can be difficult to pin down how can the nuances of a much larger text be given a binary judgment which would then be used to give a result for a specific tweet?

    How exactly would different follower relationships be determined moving forward in an attempt to improve the overall follower graph? Would location, time spent following, retweets, replies, and direct messaging determine the overall relationship and if so, how would the actual relationship be determined from those potential qualifiers?

    ReplyDelete
  11. 1. While obviously much more complex than twitter's 140 characters, would a system like this ever be able to analyze the polarity of normal, larger blog posts? It seems that the difficulty this system has is with the brevity of twitter posts so perhaps the more information contained in blog posts would actually be easier to analyze.

    2. "Another source of information that could be used to improve results is the text in pages that have been linked to from a tweet." I mentioned this in a previous question but the implementation of something like a Google Goggles kind of system here seems like it might be possible. It might even be easier since instead of interpreting a scene for details it's only trying to determine polarity.

    3. "Much work in sentiment analysis involves the use and generation of dictionaries capturing the sentiment of words." Are these dictionaries also able to determine sarcasm? If a sarcastic tweet contains a lot words with positive connotations but it is assumed that the author of the tweet is speaking in a sarcastic tone, how is this system going to detect that?

    ReplyDelete
    Replies
    1. 1 - You bring up an interesting point about length adding context, and I'd be curious to see what Dr. Baldridge's take is on this. One the one hand, you have additional context and content to decipher positivity/negativity, but a blog post could also contain numerous emotions or opinions within one post, which might make it difficult to determine whether a "pro-con" blog post is conveying only one particular sentiment or emotion.

      Delete
  12. 1) It was really interesting to read about the challenges of teaching computer algorithms to understand the sentiment behind what people are saying. This seems as if it would be especially challenging for analytics of Twitter and other social media, as the culture and idioms of the internet change far more rapidly than any culture before the internet age. Especially given the preponderance of memes, inside jokes, and slang that are often inscrutable to outsiders, how can analysts stay on top of these idiomatic developments?

    2) Another challenge that occurred to me as I was reading this article was the widespread use of irony and sarcasm. As of now, it seems as if it would be extremely difficult for a polarity-detection algorithm to, for instance, recognize a tweet that mocked a political figure by pretending to feel sorry for him/her. For example, I am thinking of an abundance of sarcastic “poor Anthony Weiner” tweets that appeared after his scandal broke. How would an algorithm distinguish irony from genuine sentiment? (Especially as even humans often seem to have trouble with this!)

    ReplyDelete
  13. 1. What's interesting to me is that, when people use social media, they don't really consider its implications and don't necessarily think about the research that their 140 characters could be shedding light upon. That being said, not being a Twitter user myself, what sort of privacy or access restrictions exist on Twitter accounts?

    2. Could this system potentially account for varying degrees of emotion, or does it necessarily imagine emotion as being very black and white?

    3. Being unfamiliar with this type of research, I must admit that a lot of the terminology is rather foreign to me. A more basic explanation of the study would be helpful.

    ReplyDelete
  14. 1. It cannot happen that every twitter has a clear opinion so that we might get a very small volume of data after extraction and filtering all those twitter. However, the twitters, which have unclear opinions, still have opinions in most cases. How will we deal with that? If we lost a large amount of data, it is hard for us to judge a product by analyzing twitter because the filtered data might mislead us.

    2. Talking about the polarity of emoticons, I think that only a few of emoticons, such as emoticons in Table 1, demonstrate very clear opinions. Most emoticons have no opinion or even just a habit that people add after every sentence. How can we deal with these emoticons?

    3. I think testing whether social connections can be used to improve polarity classification for individual tweets and users is a very impressive try. I have some idea about the second problem the author mentioned. Due to the rapid pace of growth of user graph, the author-constructed graph is a representation of the user’s current social graph. However, I guess that the growth of user graph will have a time limit because people moving on other topics very quickly.

    ReplyDelete
  15. 1. On page 54 it says that “We use several Twitter datasets as training or evaluation resources. From the annotated datasets, only tweets with positive or negative polarity are used, so neutral tweets are ignored. Why is this? How can a tweet be accurately judged as being positive, negative, or neutral if they don’t know the true emotional state of someone tweeting?

    2. Similarly to what Susan said, how does this study account for sarcasm or irony? Taking that a step further though, how does this study account for emoticons that are used sarcastically or ironically in tandem with text-based tweets?

    3. The authors said they do not give any special treatments to retweets, though doing do is a future possibility. Surely they should focus on that sooner than later as “retweeting” is one of the major features of Twitter. I would guess that the number of retweets of a tweet is highly relevant in studying what is trending at the moment.

    ReplyDelete
  16. 1. It is mentioned that we also do not give any special treatment to retweets, though doing so is a possible future improvement. So I am wondering how the retweets could affect our conclusion, and how much it the affect.

    2. It uses a balanced ratio of positive/negative labels is obtained by keeping same amount of the positive and negative labels. Why should we keep the same amount? How could we filter the labels if the amount of one kind is larger?

    3. How could the evaluation deal with polysemy? Sometime tweets does not express its literal meaning, such as sarcasm, which will bring a problem for the polarity evaluation.

    ReplyDelete
  17. 1. I’ve described Twitter as a “social geiger counter” in the past as trending hashtags rise and fall. Based on this paper’s analysis and Twitter’s ongoing popularity, is it likely to continue as a useful research addition on public opinion?

    2. Not all Twitter feeds are public, and as time goes by there seems to be an increasing level of intrusion by employers, schools, etc into these public discourses. Is it likely that Twitter will become more enclosed (as Facebook has become), thus reducing its utility for this project?

    3. Twitter’s brevity leaves little room for ambiguity, and so it is relatively easy to assign polarity to statements using the extended guidelines and annotations created. Has this technique been applied to any Liveblogging or other event coverage that can be “Twitter-like” in making short statements covering a specific event?

    ReplyDelete
  18. 1. This article is a little difficult for me. I have never come into contact with this kind of knowledge. I think this may be something related to machine learning, which I am interested in. So what is the difference between data mining and machine learning?

    2. What is polarity classification? Why do the authors use it to study twitter? And why do the authors extract tweets only basing on the initial query?

    3. Why do the authors consider emoticons as noise indicators of polarity? And I think there is a problem. A same emoticon may have different meanings in different cultures. How could we solve this problem?

    ReplyDelete
  19. 1) How have companies, politicians and advertisers actually utilized Twitter data mining? What sorts of queries might they utilize these polarity classification tools to ask?

    2) How might this work group account for sarcasm, humor, irreverence, and other similarly complex linguistic tools?

    3) The emoticon-based training was quite biased towards positive opinions. If happy faces depicting positivity are used more frequently than sad faces depicitng negativity, what can be done to get a more 3-dimensional view of the assigned polarities surrounding an issue?

    ReplyDelete
  20. This comment has been removed by the author.

    ReplyDelete
  21. 1. Do noisy labels work at all situations for polarity classification? For example, if a particular tweet contains a relevant noisy label but the user exhibits sarcasm, the polarity is reversed. Can we use mnemonics coupled with noisy labels to increase efficiency?
    2. While designing the twitter follower graph, two disadvantages are mentioned. Can a solution to the second problem be a dynamic graph or an algorithm that changes with addition of every follower or tweet?
    3. The method of analyzing the links of pages containing tweets is said to increase efficiency. Is it not more data to process and analyze? Does this not decrease efficiency in certain cases when the page is too large? Can we limit ourselves to what kind of data to look in a page, like the title or sub-heading etc?

    ReplyDelete
  22. 1. Many tweets are considered to be "noisy" - with people using hashtags to be witty or funny instead of as a trending item. Do these ‘off’ tweets affect the polarity reading?


    2. Some of the predictive words were very timely such as murphy, brittany. These words can be easily switch polarity if a new Murphy or a new Brittany were to make headlines. With words that are so sensitive to circumstances is that ok to use in a training sample?

    3. p. 56 "If the number of positive and negative words in a tweet is equal (including zero for both), the label is chosen at random" Why not neutral instead of one or the other, postive or negative at random?

    ReplyDelete
  23. 1. On page 61, the authors briefly mention the generation of dictionaries that capture the sentiment of words. I find this concept interesting, and wonder to what degree these kinds of dictionaries exist? Even if they don't, is there some kind of established standard or understanding of key words and their relative sentiments?
    2. To what extent does this study account for specific Twitter demographics (like Black Twitter, for example), and how sentiment analysis may differ among them?
    3. I'm curious about the "noisy labels" taken into account within this study. Based on research findings, are there any conclusions that can be drawn about them, and the differences among them?

    ReplyDelete
  24. 1. How does this study take into account emoticons that cannot be quantified as simply positive or negative, such as ":P" which is often used to indicate sarcasm, or at least a lack of seriousness by the tweeter? What about some of the more elaborate ones found on websites like: http://japaneseemoticons.net/all-japanese-emoticons/ Would these simply be seen as noise and disregarded, even though people clearly use them to give their tweets a 'tone'?

    2. In the same vein, how do they take sarcasm or satire into account, particularly since it's commonly misinterpreted and leads to people either going out of their way to indicate what they tweet is sarcastic, or to simply not care to clarify even at the risk of sounding 'trollish'.

    3. While I find this idea of analyzing Twitter (and on a smaller scale Internet culture at large) I felt breaking everything down into a simple positive or negative something of a turn-off. I would have liked for their to have been a third option that wasn't called something like "noise", because I think that label results in a lot of nuanced tweets being considered garbage essentially by this study. Multi-line tweets (tweets that cover one train of thought due to character limit), for instance, can provide a detailed point of view over the quick buzzword laden one liner, where do they fit here?

    ReplyDelete
  25. 1) Given that the authors state that political opinions are nuanced and complex in nature, aren’t there better Twitter datasets to choose from to analyze these programs than a political debate and highly politicized legislation?
    2) I found this project particularly interesting but question whether Twitter is a good way for companies or political organizations to judge public opinion? I’ve not looked at the demographics but I would not think that the active users of Twitter create a representative sample appropriate for any institution to judge public opinions and make financial or political decisions.
    3) The Library of Congress is archiving tweets because they believe they contain significant research value. This article made me curious about what other things could be discovered by looking through tweets and how the technologies discussed in the paper might help to make the search of hundreds of billions of tweets at the Library of Congress possible?

    ReplyDelete
  26. 1. In this article the authors test several different methods of classifying the polarity of tweets as negative or positive. In the lexicon-based baseline method they state that tweets that contain an equal number of positive and negative terms are randomly assigned to be positive or negative. However in other areas of the article when discussing tweets that might be both positive and negative or neutral they are denoted as either positive and negative or removed from testing. What is the benefit of labeling tweets that are positive and negative as randomly either one compared to labeling them some other way that might not skew the data as much?
    2. In this article the authors use tweets about major political events and issues as their datasets. However later on in the article they state that using these methods to classify the polarity of political opinions is difficult because political opinions are more nuanced. What is the benefit of doing this test on political tweets as opposed to other tweets, like reactions to nonpolitical events, which might produce clearer results?
    3. In this article the authors use methods of classifying the polarity of tweets that rely at least partially on the negative and positive words in the tweet. However what about the case of sarcasm where someone might say one thing but mean the opposite? Would situations like that tend to create false positives or negatives in the results? Do you think that any of these systems would be good at detecting sarcasm?

    ReplyDelete
  27. 1. The authors claimed that "we also do not give any special treatment to retweets, though doing so is a possible future improvement." What influences might retweets have on the researching results?

    2. In table 7: Top 20 most positive and most negative n-grams in OMD after running LPROP with All-edges and Noisyseed, there are single word as well as short phrases. This made me confused about the definitions of "positive terms" and "negitive terms" in this article, do they mean the polarity of each individual word or short phrases, or the whole sentence, or all three?

    3. I am wondering whether they were using a dictionary or controlled vocabulary to help them identify the polarity or not? If so, did they or should they take slang words (such as LOL) into consideration, since most users on twitter are young people.

    ReplyDelete
  28. 1. The authors mentioned that the vast stream of real time data (tweets) has major implications for any entity interested in public opinion and even acting on what is learned and engaging with the public directly. I think all of research the authors did are based on an assumption that what people post are exactly what they think. Before making the analysis results persuasive, I think they need to prove the assumption standing.

    2. In the dataset section, the author pointed out that from the annotated datasets, only tweets with positive or negative polarity are used, so neutral tweets are ignored. How they filter out the neutral ones? What are the criteria to filter out the “neutral” ones?

    3. All approaches the authors talked about are to extract tokens from the tweets and make stat analysis based on the gathered tokens, but how can they extract tokens from the context and make them stand on their own to make decision? Sometimes, people use positive words into a negative context, then it is going to be a negative tweet while the classifier will fit that into a positive one due to the positive tokens.

    ReplyDelete
  29. 1. Like the authors found in the results, I didn't predict that the follower/following connections would be very helpful, especially in the instances being measured (presidential debates, etc.). Because tweets are real-time phenomena, I think it would be difficult to predict or observe any true influence down the line of tweet author to follower. Also, this gets complex given the number of users who mutually follow one another.

    2. I appreciate that the authors coded for bigram tokens to allow for seemingly negative-sounding positive bits like "the shit." However, I'm wondering how this analysis would change if tokens were expanded to three or more words.

    3. "I really didn't need the reminder that Brittany Murphy died. Thanks, Dr. Lease :(" <--- my next tweet. You can run analysis on it, but I'll tell you up front that it's a negative. This potential tweet aside, I don't know how accurate a method it is to use emoticons to code for positivity of a tweet. In much the same way, I feel that hashtags are used more as an ironic finishing touch to tweets rather than actual indicators of tweet contents or sentiments.

    ReplyDelete