GitHub - makbn/twitter_persian_news_tagcloud: Tag cloud generator that extracts hot keywords from Twitter page of a Persian news agency

Twitter Persian news tagcloud extraction

Final project of Information retrieval course.

TPNT is a Tag cloud generator that extracts hot keywords from Twitter page of a major Persian news agency in the fields of Economics and Socials for each month in a year.

Dependencies

GetOldTweets-java v1.2.0
Lucene 7.2.1

News agency

Tasnim News(@TasnimNews_Fa)

How to Run

This project has to main steps. First, twitts are stored in a csv file with the help of Crawler class. this class needs some options to work properly:

Flag	Desc	Requisition
`-i`	The Id of twitter page	`required`
`-s`	Start date of extraction, format: `YYY-MM-DD`	`required`
`-e`	End date of extraction, format: `YYY-MM-DD`	no
`-m`	Limitation in the number of retrieved twitts	no
`-p`	Path of csv file	no
`-n`	Name of csv file	no

An example for retrieving twitts from (@TasnimNews_Fa) starting from 2018-06-01 to 2018-07-01 in $PWD/result/ path:

java -cp ProjectNews.jar ir.ac.um.ce.projectnews.crawler.Crawler -i Tasnimnews_Fa -s 2018-06-01 -e 2018-07-01 -p result/

The next step is indexing docs. After removing stop-words from docs we use Searcher and Classifier classes plus a Bag of word to create some queries to estimate the correlation of each doc with context. Finally, we use the most corrolated words to generate a tag clud.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
corpus		corpus
docs		docs
results		results
src/main/java		src/main/java
.gitignore		.gitignore
ProjectNews.jar		ProjectNews.jar
README.md		README.md
economy.txt		economy.txt
economy_seperated.txt		economy_seperated.txt
pom.xml		pom.xml
social.txt		social.txt
social_seperated.txt		social_seperated.txt
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Persian news tagcloud extraction

Dependencies

News agency

How to Run

Contributors

About

Releases 1

Packages

Contributors 2

Languages

makbn/twitter_persian_news_tagcloud

Folders and files

Latest commit

History

Repository files navigation

Twitter Persian news tagcloud extraction

Dependencies

News agency

How to Run

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages