Thursday, November 23, 2017

What is Open Data-Have You Heard About It?

Awareness on Open Data_Opened Eye as a Symbol
Nowadays, We are becoming internet savvy. Most of us are dependent on the data such as the documents like DOCX, PDF, PPT, XSL, images in the extension of JPEG, PNG, audio, video and so on which are published openly and freely available for all of us. We took the advantages of those data in many ways like students, academician, the researchers used it for literature review and for citation. For the statisticians, economist, policymakers data are the main sources of fact-finding and formulating new strategies. All these examples show how important the data is. In this blog, here I will be sharing the ideas on "Open Data" on the basis of the "open data awareness program" organized by Kathford R&D's wing 'KATHFOSS' in the premise of Kathford International College on 2017-11-22. The resource persons for the event were Mr. Nikesh Balami, CEO, Mr.Shubham Ghimire, COO, Mr. Sagar Ghimire, CTO of an emerging organization -Open Knowledge Nepal. 

The event was conducted within three sessions. Among them, Here is the summary of Session I:

A. What is Open Data?

According to Nikesh Balami, CEO of open knowledge Nepal, To be open data there should be at least three criteria:

i. Data should be available on the Internet: 

It should be accessed freely through the websites and data portals.

ii. Data should be machine-readable: 

With the extension of data PDF->XLS-> CSV->KDF->LOD ( also known as 5 star open data)

iii. Data should be open licensed: 

Data that does not explicitly have an open license is not open data.

B. Benefits of Open Data

i. For the government:

It increases the transparency and accountability of the government, hence develops trust to the public since Right to Information (RTI) Act 2007 guarantees that Nepali citizens can access information on the functioning of any ‘public body’ in order to make governance and policymaking more transparent and accountable. National Information Commision (NIC) is responsible for the promotion and implementation of this RTI.

ii. For the students, academicians, researchers, entrepreneurs and startup firms:

It helps for research and doing innovative projects, supports for business growth, E-learning and so on.

C. Principle of Open Data

  • Completeness
  • Timeliness
  • Primacy
  • Access
  • Machine readability
  • Non-discrimination
  • Use of commonly owns standards
  • Licensing
  • Performance
  • Usage Cost

Now, Here is the summary of Session II

A. Current Situation in Nepal:

There are few stakeholders in this sector: Civil Society Organization (CSO) like Kathmandu Living Labs, Open Nepal, Open Knowledge Nepal, Freedom Forum, Code for Nepal, Accountability Lab, Bikas Udhyami etc. work in policy research, advocacy, tech, journalism and so on.  

List of CSOs (Source: Open data Manual-Compiled by OPen Knowledge Nepal) 

Kathmandu Living Labs:
 Open Nepal:
 Open Knowledge Nepal:
 Freedom Forum:
 Code for Nepal:
 Accountability Lab:
 Bikas Udhyami:

B. Open data source for Nepal

The idea of open data entered Nepal in early 2013.The published data is still not available in open format (most of the data are published in PDF format). In spite of having rights to request and get data through the RTI Act with any government association, the Act does not a have lawful arrangement to pressurize government offices to open up their information. 

List of some government data sources (Source: Open data Manual-Compiled by OPen Knowledge Nepal)

 Official Portal of Government of Nepal:

 National Planning Commission:

 Central Bureau of Statistics:

 Ministry of Finance:
 Nepal Rastra Bank:
 Ministry of Home Affairs:
 Ministry of Education:
 Ministry of Health:
 Election Commission Nepal:
 Office of Company Registrar:

 List of some international data sources: 

 World Bank:
 United Nations:
 UN Digital Repository in Nepal:
 UNICEF Nepal:
 World Food Programme:

 List of some CSO data sources: 

 Open Nepal:
 Election Nepal:
 Nepal in Data:
 NepalMap:

This session was followed by the demo of and nepalindata

Finally, the Session III spread following info:

A. The process of working with data

i. Data Extraction

The process of retrieving data out of non-machine-readable or unstructured data sources ( web pages, emails, pdf documents, scanned documents and so on). Basically, we could not access raw data from these unstructured data. There are followings ways of extracting data from PDF:

•Word/Excel converters to extract text from PDF: https://www.pdftoexcelonline.
• Programming, with some libraries existing for Python, Java, and the command line.
• Using Tabula - an offline open-source software specifically designed to get data out of PDF documents.

Other data extraction tools:

a.Basic scraping tools


b. Extracting data with Python

 Scrapy:

c. Web scraping tools

 ScraperWiki:
 OutWit Hub:
 Scraper:

ii. Data cleaning

It is the process of fixing errors, duplicity, and format/ standard inconsistencies of extracted data. Tools and language: Spreadsheet, open refine, Python

iii. Data analysis

It is the process of examining and exploring datasets in order to generate

required information.

Online/offline open tools of data analysis

 Tableau Public:
 OpenRefine:
 Google Fusion Tables:

iv. Data visualization

It is the presentation of data in a pictorial and graphical format.

a. Non-Developers Visualization Tools

 Datawrapper:
 Infogram:
 Tableau Public:
 Plotly:
 ChartBlocks:
 Plotly:

b. Developers Visualization Tools

 D3.js:
 FusionCharts:

c. Map-Based Visualization Tools

 Leaflet:

B. Publishing data

It is a process of releasing data in a published form for use and reuse by others.
Some of the most used open data formats are JavaScript Object Notation (JSON), Extensible Markup Language (XML), Resource Description Framework (RDF), Spreadsheets, Comma Separated Value (CSV) and Plain Text.

Note: Data that is published as an excel table within a PDF document, without an open license, is not open data because it cannot be easily managed or reused.

Recommended Publishing Medium

a. Existing Data Portals

 Open Nepal Data Portal:
 Open Knowledge Nepal DataHub:

b. Independent medium

 GitHub:
 Google Drive:
 DropBox:

C. Open Data Licensing

Data that does not explicitly have an open license is not open data. Creative content, such as text, photographs, slides, and so on, should be licensed using
a Creative Commons. Similarly, Open Definition have the lists of recommended conformant licenses used by different countries:


Now, we are come to know about many issues and tools on open data. Hope this blog is helpful to explore and make understanding this new topic. I am heartily thankful towards the resource persons of Open Knowledge Nepal.

Resource persons & Participants in Open Data Workshop

