I have always been very much interested in Physics and enjoy reading related books, articles or watch shows like Carl Sagan's Cosmos, Freeman's Through the Wormhole, etc. For that matter this site name *hiregion.com* is derived from H-I-Region (Interstellar cloud).
When I saw NASA's - "Send Your Name on NASA’s Journey to Mars, Starting with Orion’s First Flight", I was excited to put my family, relatives and friends' names along with few charity names. The names will be placed on a microchip of Orion's test flight on Dec. 4, 2014 that orbits around the Earth and on future journey to Mars! Following quote from the NASA site:
Your name will begin its journey on a dime-sized microchip when the agency’s Orion spacecraft launches Dec. 4 on its first flight, designated Exploration Flight Test-1. After a 4.5 hour, two-orbit mission around Earth to test Orion’s systems, the spacecraft will travel back through the atmosphere at speeds approaching 20,000 mph and temperatures near 4,000 degrees Fahrenheit, before splashing down in the Pacific Ocean.
But the journey for your name doesn’t end there. After returning to Earth, the names will fly on future NASA exploration flights and missions to Mars.
More info at
Some of sample boarding passes:
By the time the entries were closed, I think it was on Oct.31, there were nearly 1.4million (1,379,961 exactly) names and the top countries were United States, India and United Kingdom by count with people from almost all countries having submitted their names. For more details see http://mars.nasa.gov/participate/send-your-name/orion-first-flight/world-participation-map/ . Bar chart below shows the same info.
Though US, India and UK were the top three by number of names submitted I was curious to know how countries did when adjusted for population size, GDP and area (sq. miles). With that in mind I pulled NASA data and country data from the following web sites.
Built a quick Python script to do data pull, join country data and perform some minor calculations. The code is located here at Gist or see end of this post.
Running through few R scripts and clustering them based on each country's
and then normalized through R scale for cluster selection. Optimal cluster seem to be 7 or 8. Monaco and Singapore are major outliers due to skew that happened with their small geographical area (sq. miles). See below - Monaco is that single dangler at the top right and Singapore/ Hungary are at bottom right but above rest of other countries.
Scatter plot shows much more clearly the two countries standing out especially in the middle tiles below - passengers_per_1K_sq_miles vs other two metrics ( passengers_per_10K_population and passengers_per_1Billion_gdp).
And after removing those two countries from the data frame and clustering again results in the following:
That is an interesting cluster. Countries that had highest entries adjusted for population, GDP, geo size Hungary tops the list! Maldives, Hong Kong, UK and Malta take other top 4 places. Quick normalized scores look like:
Cluster (optimal) size analysis:
It is always fun playing around with different ways to slice and dice data and below bubble world map shows simple metric of passengers count for each billion dollar GDP.
Top 5 countries, in this case, are
It will be more interesting to see how the numbers relate with each country's science and technology budget. I will try doing it in next few days as some of the data is already available in the wild. In ideal world there should be good percent of the yearly budget allocated to Science & Tech.
When I saw NASA's - "Send Your Name on NASA’s Journey to Mars, Starting with Orion’s First Flight", I was excited to put my family, relatives and friends' names along with few charity names. The names will be placed on a microchip of Orion's test flight on Dec. 4, 2014 that orbits around the Earth and on future journey to Mars! Following quote from the NASA site:
Your name will begin its journey on a dime-sized microchip when the agency’s Orion spacecraft launches Dec. 4 on its first flight, designated Exploration Flight Test-1. After a 4.5 hour, two-orbit mission around Earth to test Orion’s systems, the spacecraft will travel back through the atmosphere at speeds approaching 20,000 mph and temperatures near 4,000 degrees Fahrenheit, before splashing down in the Pacific Ocean.
But the journey for your name doesn’t end there. After returning to Earth, the names will fly on future NASA exploration flights and missions to Mars.
More info at
- NASA Orion name submissions (passengers :) by country
- Orion spacecraftt
- Orion EFT-1 test flight
- Live takeoff on Dec.05, 2014 @4am PST - YouTube
Courtesy NASA/ Wikipedia.org
Some of sample boarding passes:
By the time the entries were closed, I think it was on Oct.31, there were nearly 1.4million (1,379,961 exactly) names and the top countries were United States, India and United Kingdom by count with people from almost all countries having submitted their names. For more details see http://mars.nasa.gov/participate/send-your-name/orion-first-flight/world-participation-map/ . Bar chart below shows the same info.
Though US, India and UK were the top three by number of names submitted I was curious to know how countries did when adjusted for population size, GDP and area (sq. miles). With that in mind I pulled NASA data and country data from the following web sites.
- http://mars.nasa.gov/participate/send-your-name/orion-first-flight/world-participation-map/
- http://countrycode.org/
Built a quick Python script to do data pull, join country data and perform some minor calculations. The code is located here at Gist or see end of this post.
Running through few R scripts and clustering them based on each country's
- Orion passenger count/ 10K people
- Orion passenger count/ 1K sq. miles
- Orion passenger count/ Billion $ GDP
and then normalized through R scale for cluster selection. Optimal cluster seem to be 7 or 8. Monaco and Singapore are major outliers due to skew that happened with their small geographical area (sq. miles). See below - Monaco is that single dangler at the top right and Singapore/ Hungary are at bottom right but above rest of other countries.
Scatter plot shows much more clearly the two countries standing out especially in the middle tiles below - passengers_per_1K_sq_miles vs other two metrics ( passengers_per_10K_population and passengers_per_1Billion_gdp).
And after removing those two countries from the data frame and clustering again results in the following:
country | Score(/Pop.) | Score(/Area) | Score(/GDP) | Score_ABS |
---|---|---|---|---|
Hungary | 5.783493976 | 1.560361327 | 4.485219257 | 11.82907456 |
Maldives | 0.715814116 | 4.784567704 | 4.43908513 | 9.939466951 |
Hong Kong | -0.217141885 | 7.8493819 | -0.59223565 | 8.658759434 |
United Kingdom | 3.957774546 | 2.869764313 | 1.288187419 | 8.115726277 |
Malta | 1.085016478 | 5.903919255 | 0.393610721 | 7.382546454 |
Bangladesh | -0.195758981 | 1.116466958 | 4.697494631 | 6.00972057 |
Cluster (optimal) size analysis:
It is always fun playing around with different ways to slice and dice data and below bubble world map shows simple metric of passengers count for each billion dollar GDP.
Top 5 countries, in this case, are
Bangladesh | 133.95982 |
Hungary | 128.75381 |
Maldives | 127.62238 |
Philippines | 125.95591 |
Kosovo | 106.8 |
It will be more interesting to see how the numbers relate with each country's science and technology budget. I will try doing it in next few days as some of the data is already available in the wild. In ideal world there should be good percent of the yearly budget allocated to Science & Tech.
Data pull Python code:
#!/Users/shiva/anaconda/bin/python # -*- coding: utf-8 -*- import os import sys import re import locale import pprint import scraperwiki from bs4 import BeautifulSoup from collections import defaultdict class NasaData(): nasa_file_path = "/tmp/nasa_orion_reg_by_country.txt" ctry_file_path = "/tmp/countrycode_org_data.txt" nasa_site = "http://mars.nasa.gov/participate/send-your-name/orion-first-flight/world-participation-map/" ctry_site = "http://countrycode.org/" metrics_file_path = "/tmp/nasa_metrics_by_country.txt" def __init__(self): pass def get_nasa_entries(): ''' Scrape NASA Orion participants count by country data Ouptput to file nasa_orion_reg_by_country.txt Args: None ''' html = scraperwiki.scrape( NasaData.nasa_site ) soup = BeautifulSoup( html ) out_file = NasaData.nasa_file_path if os.path.exists( out_file ) and os.path.getsize( out_file ) > 10: print "Warning: " + out_file + " exists. Continuing without scraping NASA data.\n" return False countries = soup.find( 'ul', class_='countryList' ) with open( out_file, 'wt' ) as fh: for country in countries.findAll('li'): c_name = country.find('div', class_='countryName').text c_num = country.find('div', class_='countNumber').text.strip() # line = c_name + "," + c_num + "\n" line = ''.join([c_name, ',', c_num, '\n']) fh.write(line) return True def get_country_details(): ''' Scrape countrycode data including population, gdp, area, etc. Dump output to file countrycode_org_data.txt Args: None ''' html = scraperwiki.scrape(NasaData.ctry_site) soup = BeautifulSoup(html) out_file = NasaData.ctry_file_path if os.path.exists( out_file ) and os.path.getsize( out_file ) > 10: print "Warning: " + out_file + " exists. Continuing without scraping COUNTRY_CODE data.\n" return False cnty_table = soup.find( lambda tag: tag.name == 'table' and tag.has_attr('id') and tag['id'] == "main_table_blue" ) countries = cnty_table.findAll( lambda tag: tag.name == 'tr' ) with open( out_file, 'wt' ) as fh: for country in ( countries ): cnty_str = '|' cnty_attr = country.findAll( lambda tag: tag.name == 'th' ) if ( cnty_attr ): for attr in ( cnty_attr ): cnty_str += attr.contents[0] + "|" else: cnty_attr = country.findAll( lambda tag: tag.name == 'td' ) if ( cnty_attr ): for ix, val in ( enumerate(cnty_attr) ): if ix == 0: cnty_str += val.findAll( lambda tag: tag.name == 'a' )[0].string + "|" # Get country name else: cnty_str += val.contents[0].strip() + "|" # Get country attrs # print cnty_str fh.write( cnty_str + "\n" ) return True def join_country_data(): ''' Join two data sets by country name and write to file nasa_metrics_by_country.txt country names and its metrics Args: None ''' fh = open( NasaData.metrics_file_path, 'wt' ) # Country names lowercased, removed leading "The ", removed leading/trailing and extra spaces nasa_data = defaultdict(list) cc_org_data = {} for line in open( NasaData.nasa_file_path, 'rt' ): ln_els = line.strip('\n').split(',') ln_els[0] = ln_els[0].lower() ln_els[0] = re.sub(r'(^[Tt]he\s+)', '', ln_els[0]) ln_els[0] = re.sub(r'(\s{2,})', ' ', ln_els[0]) nasa_data[ln_els[0]].append(ln_els[1]) # orion_vote appended # nasa_data dict appended with country data. key:country => values[orion_votes, pop., area, gdp] for l_num, line in enumerate( open( NasaData.ctry_file_path, 'rt') ): # line: |Afghanistan|AF / AFG|93|28,396,000|652,230|22.27 Billion| if l_num == 0: continue # Skip header ln_els = line.strip('\n').split('|') ln_els[1] = ln_els[1].lower() ln_els[1] = re.sub(r'(^[Tt]he\s+)', '', ln_els[1]) ln_els[1] = re.sub(r'(\s{2,})', ' ', ln_els[1]) # Strip out comma in pop(element 4) and area (5) nasa_data[ln_els[1]].append( ln_els[4].translate(None, ',') ) # pop appended nasa_data[ln_els[1]].append( ln_els[5].translate(None, ',') ) # area appended # Normalize gdp to millions gdp = re.match( r'(\d+\.?\d*)', ln_els[6] ).group(0) gdp = float(gdp) if re.search( r'(Billion)', ln_els[6], re.I ): gdp = gdp * 1000 elif re.search( r'(Trillion)', ln_els[6], re.I ): gdp = gdp * 1000000 nasa_data[ln_els[1]].append( gdp ) # gdp appended # TODO: Some country names are not standard in NASA data. Example French Guiana is either Guiana or Guyana # Delete what is not found in country code data or match countries with hard coded values locale.setlocale(locale.LC_ALL, '') for cn in sorted(nasa_data): # country name # array has all nasa_votes, pop., sq miles, gdp and has pop > 0 and gdp > 0. Capitalize name. if len(nasa_data[cn]) > 3 and int(nasa_data[cn][1]) > 0 and int(nasa_data[cn][3]) > 0: l = ( cn.title() + ":" + nasa_data[cn][0] + ":" + locale.format( '%d', int(nasa_data[cn][1]), 1 ) # pop + ":" + str( round( float( nasa_data[cn][0] ) * 10000/ int(nasa_data[cn][1]), 5 )) # per 10K pop + ":" + locale.format( '%d', int(nasa_data[cn][2]), 1 ) # area + ":" + str( round( float( nasa_data[cn][0]) * 1000 / int(nasa_data[cn][2]), 5 )) # per 1K sq mile + ":" + locale.format( '%d', int(nasa_data[cn][3]), 1 ) # gdp + ":" + str( round( float( nasa_data[cn][0]) * 1000 / nasa_data[cn][3], 5 )) # per Billion $ gdp + "\n" ) fh.write(l) return True if __name__ == "__main__": get_nasa_entries() get_country_details() join_country_data() exit( 0 )
No comments:
Post a Comment