The event was conducted within three sessions. Among them, Here is the summary of Session I:
A. What is Open Data?
According to Nikesh Balami, CEO of open knowledge Nepal, To be open data there should be at least three criteria:
i. Data should be available on the Internet:
It should be accessed freely through the websites and data portals.
ii. Data should be machine-readable:
With the extension of data PDF->XLS-> CSV->KDF->LOD ( also known as 5 star open data)
iii. Data should be open licensed:
Data that does not explicitly have an open license is not open data.
B. Benefits of Open Data
i. For the government:
It increases the transparency and accountability of the government, hence develops trust to the public since Right to Information (RTI) Act 2007 guarantees that Nepali citizens can access information on the functioning of any ‘public body’ in order to make governance and policymaking more transparent and accountable. National Information Commision (NIC) is responsible for the promotion and implementation of this RTI.
ii. For the students, academicians, researchers, entrepreneurs and startup firms:
It helps for research and doing innovative projects, supports for business growth, E-learning and so on.
C. Principle of Open Data
- Machine readability
- Use of commonly owns standards
- Usage Cost
Now, Here is the summary of Session II
A. Current Situation in Nepal:
There are few stakeholders in this sector: Civil Society Organization (CSO) like Kathmandu Living Labs, Open Nepal, Open Knowledge Nepal, Freedom Forum, Code for Nepal, Accountability Lab, Bikas Udhyami etc. work in policy research, advocacy, tech, journalism and so on.
List of CSOs (Source: Open data Manual-Compiled by OPen Knowledge Nepal)
Kathmandu Living Labs: http://kathmandulivinglabs.org
Open Nepal: http://opennepal.net
Open Knowledge Nepal: http://oknp.org
Freedom Forum: http://freedomforum.org.np
Code for Nepal: http://codefornepal.org
Accountability Lab: http://accountabilitylab.org
Bikas Udhyami: http://bikasudhyami.com
B. Open data source for Nepal
The idea of open data entered Nepal in early 2013.The published data is still not available in open format (most of the data are published in PDF format). In spite of having rights to request and get data through the RTI Act with any government association, the Act does not a have lawful arrangement to pressurize government offices to open up their information.
List of some government data sources (Source: Open data Manual-Compiled by OPen Knowledge Nepal)
Official Portal of Government of Nepal: https://www.nepal.gov.np
National Planning Commission: http://www.npc.gov.np/en
Central Bureau of Statistics: http://cbs.gov.np/home
Ministry of Finance: http://mof.gov.np/en/
Nepal Rastra Bank: https://www.nrb.org.np
Ministry of Home Affairs: http://www.moha.gov.np/home
Ministry of Education: http://www.moe.gov.np
Ministry of Health: http://mohp.gov.np
Election Commission Nepal: http://election.gov.np
Office of Company Registrar: http://ocr.gov.np
List of some international data sources:
World Bank: https://goo.gl/wFjYgH
United Nations: https://goo.gl/UGUoCh
UN Digital Repository in Nepal: http://un.info.np
UNICEF Nepal: https://goo.gl/QwHhJL
World Food Programme: https://goo.gl/2sG2aS
List of some CSO data sources:
Open Nepal: http://data.opennepal.net
Election Nepal: http://electionnepal.org
Nepal in Data: https://nepalindata.com
This session was followed by the demo of nepalmap.org and nepalindata
Finally, the Session III spread following info:
A. The process of working with data
i. Data Extraction
The process of retrieving data out of non-machine-readable or unstructured data sources ( web pages, emails, pdf documents, scanned documents and so on). Basically, we could not access raw data from these unstructured data. There are followings ways of extracting data from PDF:
•Word/Excel converters to extract text from PDF: https://www.pdftoexcelonline.
• Programming, with some libraries existing for Python, Java, and the command line.
• Using Tabula - an offline open-source software specifically designed to get data out of PDF documents.
Other data extraction tools:
a.Basic scraping tools
b. Extracting data with Python
Python Mechanize: https://pypi.python.org/pypi/mechanize/
c. Web scraping tools
OutWit Hub: https://goo.gl/1Axk88
ii. Data cleaning
It is the process of fixing errors, duplicity, and format/ standard inconsistencies of extracted data. Tools and language: Spreadsheet, open refine, Python
iii. Data analysis
It is the process of examining and exploring datasets in order to generate
Online/offline open tools of data analysis
Tableau Public: https://public.tableau.com/s/
Google Fusion Tables: https://goo.gl/XEFUVB
iv. Data visualization
It is the presentation of data in a pictorial and graphical format.
a. Non-Developers Visualization Tools
Tableau Public: https://public.tableau.com/
Timeline JS: http://timeline.knightlab.com/
b. Developers Visualization Tools
Google Charts: https://developers.google.com/chart/
c. Map-Based Visualization Tools
B. Publishing data
It is a process of releasing data in a published form for use and reuse by others.
Note: Data that is published as an excel table within a PDF document, without an open license, is not open data because it cannot be easily managed or reused.
Recommended Publishing Medium
a. Existing Data Portals
Open Nepal Data Portal: http://data.opennepal.net
Open Knowledge Nepal DataHub: https://old.datahub.io/organization/nepal
b. Independent medium
Google Drive: http://drive.google.com
C. Open Data Licensing
Data that does not explicitly have an open license is not open data. Creative content, such as text, photographs, slides, and so on, should be licensed using
a Creative Commons. Similarly, Open Definition have the lists of recommended conformant licenses used by different countries: http://opendefinition.org/licenses/
Now, we are come to know about many issues and tools on open data. Hope this blog is helpful to explore and make understanding this new topic. I am heartily thankful towards the resource persons of Open Knowledge Nepal.