-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.txt
210 lines (157 loc) · 7.14 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
Libraries needed:
* PyPDF
* PyPDF2
* Camelot
* Tabula
* Pandas
* Numpy
* Matplotlib
* Seaborn
* Os
* Tabulate
-----------------------------------------------------------------------------------------
Steps to execute project:
1. Extract group34-project.zip in your local system
2. Steps to execute the preprocessing part:
Note: The Preprocessing part will take some time (around half an hour) please be patient.
1. Goto folder named "Preprocessing"
2. Goto Preprocessing1 or Preprocessing2
3. Open your command line Terminal in the same folder and run the following commands
./preprocessing1.sh or ./preprocessing2.sh to run either preprocessing parts.
3. Steps to execute the analysis part:
1. Goto folder named "Analysis"
2. Open your command line Terminal in the same folder and run the following commands
./analysis.sh
-----------------------------------------------------------------------------------------
NOTE:
1. Outputs are generated in folder named ‘Outputs’
2. ‘Figures’ folder contains all images
3. "Preprocessing" folder contains all our pre-processing code that we used to generate suitable CSV files
4. We have used Python 3.8.10 to compile all our Python scripts
-----------------------------------------------------------------------------------------
Questions Description (corresponding CSVs, sh file names):
1. Which program in each of the top(5) institutes has the highest/lowest female to male ratio?
* Output CSV’s:
* “Top institutes with highest female-male ratio.csv”.
* Executable bash file: q1.sh
2. For all the institutes find the ratio of socially challenged (SC/ST, OBC etc) to a total number of children as well as economically backward people?
* Output CSV’s:
* “Top institutes with highest economically backward-total ratio.csv”,
* “Top institutes with highest socially backward-total ratio.csv”.
* Executable bash file: q2.sh
3. Which institute offers the maximum number of people a full waiver in the fee?
* Output CSV’s:
* “Top programs and colleges which provide reimbursement to maximum number of students.csv”.
* Executable bash file: q3.sh
4. Which institute has the percentage of people who do not belong to its state (foreign as well as different states)? show the diversity present among the students of each course(ratio of instate to outside students)
* Output CSV’s:
* “Top institutes with highest number of migrants.csv”.
* Executable bash file: q4.sh
5. Which are the colleges that have the highest overfill/underfill per cent?
* Output CSV’s:
* “Top institutes with highest overfill percent.csv”,
* “Top institutes with highest number of vacancies.csv”.
* Executable bash file: q5.sh
6. Which institutes have lifts/ramps in less than 80% of the buildings?
* Program name: q6.py
* Executable File: q6.sh
* Output CSV:
* Lift Facilities.csv
7. Which institutes don't allow inter-building movements? Also report the overall percentage of such institutes.
* Program name: q7.py
* Executable File: q7.sh
* Output CSV:
* Movement Facilities.csv
8. What all instis. don't have special(differently abled people) toilets in at least 80% buildings?
* Program name: q8.py
* Executable File: q8.sh
* Output CSV:
* Toilet Facilities.csv
9. Find the top 5 and bottom 5 institute names, ratio which have the highest #student/#faculty ratio?
* Program name: q9.py
* Executable File: q9.sh
* Output CSV’s:
* Top 5 student to faculty ratio.csv,
* Worst 5 student to faculty ratio.csv
10. Find the top 5 institutes which received the highest annual earnings over the period of the last three years, 2017-18 to 2019-20(sum).
* Program name: q10.py
* Executable File: q10.sh
* Output CSV’s:
* 2017-18 EDP Earnings.csv,
* 2018-19 EDP Earnings.csv,
* 2019-20 EDP Earnings.csv,
* Total EDP Earnings.csv
11. Which college has the highest benefit from Executive Development Programme #Amount of Annual earnings / #Total students.
* Program name: q11.py
* Executable File: q11.sh
* Output CSV’s:
* EDP to Strength ratio 2018.csv,
* EDP to Strength ratio 2019.csv,
* EDP to Strength ratio 2020.csv,
* EDP to Strength ratio Total.csv
12 Find the colleges with the highest percentage of students going for higher studies.
* Output CSV’s:
* “higher_studies.csv”.
* Executable bash file: q12.sh
13. Find the colleges with a minimum percentage of unplaced students. (year-wise)
* Output CSV’s:
* “unplaced.csv”.
* Executable bash file: q13.sh
14. Find the course per college (over all the years: sum) that provides the highest package?
* Output CSV’s:
* “best_program.csv”.
* Executable bash file: q14.sh
15. For each college, find the average number of students doing a PhD over the last 3 years, full time as well as part-time?
* Output CSV’s:
* “phd.csv”.
* Executable bash file: q15.sh
16. Find the top 5 and bottom 5 institutes which received the highest fundings from the consultancy projects for the last three years.
* Executable File: q16.sh (uses q16.py)
* Output Files:
* top5-consultancy-funding-{year}.csv
* bottom5-consultancy-funding-{year}.csv
17. Find the Top 5 institutes with the highest number of projects for each of the years.
* Executable File: q17.sh (uses q17.py)
* Output File:
* consultancy_projects_top5.csv
18. For each institute, find the year in which the ratio between the Number of client Projects and the Number of Client organisations was the highest.
* Executable File: q18.sh (uses q18.py)
* Output File:
* Maximum-project-to-client-ratio-year.csv
19. Find the percentage of sponsored research projects and consultancy projects for each institute for 3 years.
* Executable File: q19.sh (uses q19.py)
* Output File:
* Sponsored Research projects and consultancy projects.csv
20. Which program per institute had the highest as well as the lowest intake for the last 2-3 years?
* Program name: q20.py
* Executable File: q20.sh
* Output file:
* min_max_program.csv
21. Find the year in which every institute had the highest ratio for # projects/#agency among the above mentioned three years?
* Program name: q21.py
* Executable File: q21.sh
* Output file:
* max_project_year.csv
22. Find the top 5 (in decreasing order) institute which received the highest amount over the period across the last three years, 2017-18 to 2019-20.
* Program name: q22.py
* Executable File: q22.sh
* Output file:
* highest_sponsorship.csv
23. Find the total intake of every insti. And the top 5 and bottom 5 institutes.
* Program name: q23.py
* Executable File: q23.sh
* Output file:
* 2019-20_intake.csv,
* 2018-19_intake.csv,
* 2017-18_intake.csv,
* 2016-17_intake.csv,
* 2015-16_intake.csv,
* 2014-15_intake.csv and
* overall_intake.csv
24. Report colleges having rare courses (PG 1 year).
* Program name: q24.py
* Executable File: q24.sh
* Output file:
* rare_program.csv
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------