k-index.Rmd

---
title: "Citation network centrality: a scientific awards predictor?"
output: html_notebook
bibliography: refs.bib
---

# Introduction 

This is `K-INDEX`, an implementation and a test case
of a scholarly literature (Scientometrics) measure 
to rank research authors using the citations
of articles' citations. 
The test case is its use to try to predict the
the recipient of scientific awards. We use the Web of Science data
as input and use the results for the prediction of Nobel Laureates
of Physics.

# Input Data 

The data to be processed comes from CSV (comma-separated
values) and TSV (tab-separated vaules) files containing, among other
data, the papers and its number of citations (CSV) or number of
citings (TSV) of researchers. Each file stores data about one
researcher.  The citing is the number of citations received by a
paper that cites the researcher paper in question. The CSV files are
used to calculate the $h$-index and TSV are used to find the
$K$-index. An index file with author's identification and some
information like his/her homepage is used to associate the data
files. For example, an author with an Researcher ID equals to
''Z-1111-1900'' has the articles' citings in a file named
''Z-1111-1900.txt''.

Authors basic information was picked from the Web of Science page,
more specifically in
[Clarivate](https://hcr.clarivate.com/\#categories\%3Dphysics) page of
most cited authors in physics.

The fields are separated by semicolon inside |authors.idx|, a record in
the file looks like

```{csv}
L-000-000;Joe Doe;http//joedoe.edu
```

\noindent where the first field `L-000-000` is the Research ID or ORCID,
when the author doesn't have an identifier, a custom
number is assigned. The second field `Joe Doe` is the author name
and the third field is the link to the page that contains information
about author's publications. A structure is loaded with these data and
a pointer to this structure is passed to the array |authors|.  Lately,
$h$-index and $K$-index will be calculated and assigned to the proper
field in the structure.

# Indices Calculation

There is procedure in this program to
calculate the Scientometrics index $K$. The $h$ is the Hirsch index
proposed by Hirsch [-@hirsch2005]. It is obtained at Web of Science, so no further
procedure is needed. 
The $K$ index and was proposed by Kinouch _et al._ [-@kinouchi2018].

## h-index

The number of papers is in decreasing order of citations
that the number of citations is greater than the paper position is the
$h$-index.  On Web of Science homepage, the procedure to find the $h$ of
an author is as follows:

* Search for an author's publications by `Author`
 or `Author Identifiers`;
* Click on the link `Create Citation Report`;
* The $h$-index is showed at the top of the page.

The $h$-index value is stored in the author record structure and saved
in `authors.idx` file.

```{r}
k.index.comment.char <- "#"
k.index.csv.sep <- ";"
authors <- read.csv(file = "phys.csv", 
                    header = TRUE, 
                    sep = k.index.csv.sep, 
                    comment.char = k.index.comment.char)

```

## K-index. 

If an author receives at least K citations, where each
one of these K citations have get at least K citations, then the
author's $K$-index was found. On Web of Science homepage, the procedure
to find the K of an author looks like below:

* Search for an author's publications;
* Click on the link `Create Citation Report`;
* Click on the link `Citing Articles without self-citations`;
* Traverse the list, stopping when the rank position of the article were
greater than the {\it Times Cited\/};
* Subtract on from the rank position, this is the K value.

To calculate in batch mode, we downloaded a file with the data to
calculate the $K$ by clicking on the button `Export...` and
selecting `Fast 5K` format that saves the same data, with limit
of 5.000 records, where each field is separated by one or more tabs
that is assigned to variable `kindex.tsv.sep`.

```{r}
kindex.tsv.sep <- "\t"
```


# References

<div id="refs"></div>