Name:
Answer Key

SEE BODY TAG

SEE BODY TAG

 

SEE BODY TAG


Write scripts for the following questions.
Begin each script with a comment that includes your name, the date, and the question number, and a one-line description of what the script does.
Upload each script to the course website. Do not submit Jupyter notebook (.ipynb) files!


  1. Datafile handling

    Download this file for testing purposes:

    Name this script "question" (or "q-stats.py"). Write a Python3 script that can be run from the command line but with no command-line arguments. It gets a filename, opens and reads the file, then calculates the arithmetic mean (average), harmonic mean, and geometric mean of each column of values. Do not use the numpy module for this problem. It will be helpful to open the datafile in a text editor and examine its format.

    For full credit, create functions to calculate each of the means.

    Write it to do the following:
    1. [1 pt] Prompt for a data-file name, and get the name from user input.
    2. [3 pts] Open and read the file one line at a time. Each line will either be a blank line, a comment line that starts with a "#", or a data line that contains three floating point values. Ignore blank lines and comments, and collect the values into three lists — one each for first, second, and last values on the lines.
    3. [3 pts] Calculate and print out a list containing the arithmetic means (averages) of each column of values. The result should look like this:
      arithmetic means: [2.6074895432339162, 5.20562171446492, 6.905943485536829]
      
    4. [3 pts] Calculate and print out a list containing the harmonic means of each column of values. The harmonic mean is defined as:
      μh =  n / ( ∑0n  1/value )
      i.e., the number of values, divided by the sum of the inverses of all the values.
      harmonic means: [0.5560924504671169, 1.21995125127702, 1.7010902157363277]
      
    5. [3 pts] Calculate and print out a list containing the geometric means of each column of values. The geometric mean is defined as:
      μg = ( ∏0n  value ) 1/n
      i.e., the nth root of the product of all the values.
      geometric means: [1.6619671254633932, 3.155222818080285, 3.8202727543730526]
      
    6. [2 pts] Print the lists, along with the list lengths, in a formatted table like this:
               lengths:      100      100      100
      arithmetic means:    2.607    5.206    6.906
        harmonic means:    0.556    1.220    1.701
       geometric means:    1.662    3.155    3.820
      
  2. Website reading and regular expressions

    Name this script "question" (or "q-url.py").

    Write a Python3 script that can be run from the command line, with one or more arguments. Each argument (after the first one, the file's name) will be a URL. The script should do each of the following tasks (each is worth separate points, you can write one script to do all tasks):

    1. [5 pts] Retrieve and open the URL. The URL may begin with "http://" or with "https://", and retrieving it may not succeed (for example, the website could be down). Read in the contents. Remember that the contents will read in as bytes; you must decode them to text.

      Some possible URLs include:

      • http://bloomu.edu
      • http://virtualbox.org
      • https://google.com
      • https://nytimes.com
      • https://montcs.bloomu.edu/215/

      Note that automatically generated webpages might not be divided into separate lines, so the "read()" method is most appropriate.

      Report the URL and the total length of the contents (note that the URL's contents may change from day to day).

      $ python3  q-url.py  https://google.com  http://virtualbox.org
      https://google.com: not utf-8
      https://google.com
      	11380 characters
      http://virtualbox.org
      	9989 characters
      #--------
      $
      
    2. [5 pts] Use a regular expression to find all all hyperlink tags in the pages that you retrieved. A hyperlink tag is a string that starts with "<a href=" and ends with ">".

      Print all hyperlinks for each URL, and print the number of hyperlinks. The results should look similar to this:

      $ python3  q-url.py  https://google.com  http://virtualbox.org
      https://google.com: not utf-8
      https://google.com
      	11380 characters
      http://virtualbox.org
      	9989 characters
      #--------
      https://google.com
           <a href="http://www.google.com/history/optout?hl=en" class=gb4>
           <a href="/advanced_search?hl=en&authuser=0">
           <a href="/language_tools?hl=en&authuser=0">
           <a href="/intl/en/ads/">
           <a href="/services/">
           <a href="/intl/en/about.html">
           <a href="/intl/en/policies/privacy/">
           <a href="/intl/en/policies/terms/">
      8 links.
      #----
      http://virtualbox.org
           <a href="/wiki/VirtualBox">
           <a href="/wiki/Screenshots">
           <a href="/wiki/Downloads">
           <a href="/wiki/Documentation">
           <a href="/wiki/End-user_documentation">
           <a href="/wiki/Technical_documentation">
           <a href="/wiki/Contributor_information">
           <a href="/wiki/Community">
           <a href="/">
           <a href="/wiki/VirtualBox">
           <a href="/login">
      
                  
      
           <a href="https://www.oracle.com">
           <a href="https://www.oracle.com/virtualization/virtualbox/resources.html">
           <a href="https://www.oracle.com/html/privacy.html">
           <a href="https://www.oracle.com/html/terms.html">
      27 links.
      #----
      $
      
    3. [5 pts] For each link in each file, strip off the <a href=" part at the beginning, and everything after the next double-quote ("), to leave only the linked URL itself. Save these URLs into lists (one for each of the original command-line URLs). Finally, print out the average length of the included URLs for each original URL.

      Note that there are multiple ways to remove the extra text from beginning and end — for example, by slicing, or lstrip() and rstrip(). Use whatever method works for you.

      $ python3  q-url.py  https://google.com  http://virtualbox.org
      https://google.com: not utf-8
      https://google.com
      	11331 characters
      http://virtualbox.org
      	9989 characters
      #--------
      https://google.com
           <a href="http://www.google.com/history/optout?hl=en" class=gb4>
           <a href="/advanced_search?hl=en&authuser=0">
           <a href="/language_tools?hl=en&authuser=0">
           <a href="/intl/en/ads/">
           <a href="/services/">
           <a href="/intl/en/about.html">
           <a href="/intl/en/policies/privacy/">
           <a href="/intl/en/policies/terms/">
      8 links.
      #----
      http://virtualbox.org
           <a href="/wiki/VirtualBox">
           <a href="/wiki/Screenshots">
           <a href="/wiki/Downloads">
      
                  
      
           <a href="https://www.oracle.com/virtualization/virtualbox/resources.html">
           <a href="https://www.oracle.com/html/privacy.html">
           <a href="https://www.oracle.com/html/terms.html">
      27 links.
      #----
      https://google.com: 25.875 average link length.
      http://virtualbox.org: 27.000 average link length.
      #--------
      $
      
  3. Regular expressions

    Download one or more of these files for testing purposes:

    Name this script "question" (or "q-regex.py"). Write it to do each of the following:

    1. [5 pts] Accept one command-line argument that is the name of a text file (which contains a webserver log). Open and read the file. Report the number of lines in the file.
    2. [5 pts] Search each line for a string that starts with an IP address and includes one of the http-verbs "GET" or "HEAD", followed by some text, followed by an "error-code" string.

      The "error-code" string is a 3-digit number starting with 4, followed by either a 0 or a 1, then followed by a third digit. For example, the code "404" indicates a URL that was not found on the webserver, while "413" indicates "Request Entity Too Large". So these are lists of requested URLs that weren't provided for some reason.

      Create a dictionary of lines that match. The keys of the dictionary must be tuples containing the the http-verb ("GET" or "HEAD") and the error-code.k The value for each key is a list containing more tuples. These tuples contain the IP address and the substring that lies between the keyword and the error code.

      Then print each key, and the length of its list, in a formatted line.

      A sample run is shown here:
      $ python3  q-regex.py access.log.0
      17931 lines in access.log.0
      Key: ('GET', '404')     ---  969 matches
      Key: ('GET', '403')     ---   27 matches
      Key: ('GET', '407')     ---    2 matches
      Key: ('GET', '417')     ---    1 matches
      Key: ('GET', '415')     ---    2 matches
      Key: ('GET', '416')     ---    1 matches
      Key: ('GET', '408')     ---    2 matches
      Key: ('GET', '411')     ---    1 matches
      Key: ('GET', '410')     ---    1 matches
      Key: ('GET', '409')     ---    1 matches
      Key: ('HEAD', '404')    ---    1 matches
      $
      
    3. [5 pts] The script must additionally open and write an output file named "report.txt". This file must contain:
      • Each (http-verb, error-code) key
      • for each key, each of its matches, indented 4 spaces.

      The output file should look like this:

      $ cat report.txt
      ('GET', '404')
           ('50.29.198.151', '/Graphics/bullets/dot.4x4.gif HTTP/1.1"')
           ('176.240.196.146', '/~bobmon/css/navtabs.css HTTP/1.1"')
           ('208.115.111.74', '/%7Ebobmon/readings/scratch_monkeys.html HTTP/1.1"')
           ('72.79.165.247', '/491/grades.shtml HTTP/1.1"')
      
              
      
      ('GET', '411')
           ('108.57.48.110', '/~bobmon/cgi-bin/date.php HTTP/1.1" 200')
      ('GET', '410')
           (None, '/~bobmon/cgi-bin/date.php HTTP/1.1" 200')
      ('GET', '409')
           ('108.57.48.110', '/~bobmon/cgi-bin/makesched.php?datafile=/~bobmon/330/330-sched.dat HTTP/1.1" 200')
      ('HEAD', '404')
           ('193.227.174.10', '/instructor-info.html HTTP/1.1"')
      $
      
  4. matplotlib

    Name this script "question" (or "q-plot.py"). Write it to do each of the following:
    1. Use a web browser to download these three files: Each file contains lines that have two numbers on them — the first is a numeric date, and the second is a measurement. Here are some sample lines:
      # date        measurement
      1971.875  0.445079406236
      1972.30769231  0.421696503429
      # no entry for 1973
      1974.32692308  0.922239565104
          
      
    2. [5 pts] Write code to open each file and read each line.

      If the line begins with "#" then skip it (it's a comment);
      otherwise split the line into two values and convert each value to a floating-point number.

      Save the data from each file into dictionaries, where the first number is the key and the second number is the value. Here are the entries corresponding to the sample data above:

          {1971.875: 0.445079406236,
           1972.30769231: 0.421696503429,
           1974.32692308: 0.922239565104,
              
          

      Print each dictionary.

    3. [10 pts] transistors
      Create a matplotlib plot that graphs each dictionary. The keys are the x values, and the corresponding values are the y values.

      Set the Y axis to be logarithmic with a statement like this:

       plt.yscale('log')
      Also add a legend.

      For full credit, make a graph that looks like this:
  5. numpy and matplotlib

    Name this script "question" (or "q-numpy.py"). Write it to do the following:
    1. [5 pts] Create a numpy array. The array must have three dimensions — the first dimension is 5 elements long; the second dimension is 6 elements long; the third dimension is 4 elements long. Fill the array with random floating-point numbers.

      Create a reshaped copy of your first array, that has two dimensions — 10 rows by 12 columns.

      Print the "shape" of each of your arrays.

      Your output should look like this:
      $ python3  q-numpy.py
      First array: shape (5, 6, 4)
      Second array: shape (10, 12)
      
      
    2. [5 pts] Generate a third array whose elements are a function of the elements in the second (2-dimensional) array. The function is f(x) = sin(x)·cos(x2), for each value in the two-dimensional array. Thus the third array is also a two-dimensional array, each of whose elements if f ( the corresponding 2nd-array element ).

      Print the third array. (Hint: try the pprint module to make the output look nicer, as seen here. This isn't required.)

      $ python3  q-numpy.py
      
          
      
      Third array:
      array([[0.14651342, 0.28266526, 0.1007512 , 0.5482164 , 0.00231006,
              0.14057223, 0.56557999, 0.50291083, 0.51726606, 0.37849568,
              0.53080959, 0.38931452],
             [0.34550465, 0.18984179, 0.1749067 , 0.12807764, 0.46784614,
              0.54561654, 0.56440489, 0.47382489, 0.46937581, 0.43717831,
              0.07963974, 0.1021673 ],
             [0.13352766, 0.35146559, 0.45058095, 0.46757252, 0.55055506,
              0.54047353, 0.06131804, 0.52630785, 0.19511769, 0.49692412,
              0.56505804, 0.12979144],
             [0.57299221, 0.38774584, 0.57732272, 0.03732514, 0.45718476,
              0.49709878, 0.14970336, 0.14335622, 0.11751473, 0.57732247,
              0.54203592, 0.45960362],
             [0.32710445, 0.39486948, 0.56364191, 0.53332977, 0.24027803,
              0.26836029, 0.57723997, 0.39758139, 0.20164227, 0.47062311,
              0.47850417, 0.57594249],
             [0.51873473, 0.4656815 , 0.48147651, 0.48181525, 0.15620491,
              0.17236338, 0.23074127, 0.29422631, 0.51705904, 0.3190793 ,
              0.54373197, 0.45946981],
             [0.54390652, 0.19675682, 0.47630461, 0.21717129, 0.5708821 ,
              0.12464502, 0.10299517, 0.42332459, 0.19683402, 0.13741823,
              0.27593077, 0.56309188],
             [0.51741854, 0.10071471, 0.01043648, 0.47306764, 0.15042324,
              0.39123524, 0.42296891, 0.37602383, 0.06039781, 0.5035686 ,
              0.48728796, 0.52363985],
             [0.40034186, 0.2806177 , 0.5485161 , 0.22654447, 0.47039861,
              0.52922965, 0.44879987, 0.47240229, 0.5087267 , 0.11278457,
              0.57639249, 0.53688795],
             [0.46566853, 0.50424247, 0.56633177, 0.48060833, 0.48557023,
              0.22872274, 0.11375626, 0.51535727, 0.48372245, 0.28384408,
              0.09471029, 0.04051136]])
      $
      
    3. [5 pts] random arrays

      Use matplotlib to make a one-column, two-row figure containing filled contour plots of each of the second array ("surface") and third array ("sin(surface)"). Each plot should have a title. Your figure should look similar to the figure at right:

  6. Name this script "question" (or "q-csv.py"). Download this file for use in your script:

    Write a Python3 script that can be run from the command line, with no arguments. The script should do each of the following tasks (each is worth separate points, you can write one script to do all tasks):

    Write it to do each of the following:
    1. [5 pts] Open the file "1895-2020.PA-May.mean-temperature.csv" as a csv file. Create a "csv.reader()" object.
    2. [10 pts] Using the reader object, read the first five lines into an array of headers. Then read the rest of the data into three lists — numbers representing the year and month (for example, "202004" for April of 2020); average temperature for the corresponding year; and an "anomaly" that is the difference between that year's average temperature and a baseline value.

      Calculate the average of the average temperatures, and of the anomalies.

      Print each list of data, and the averages. The output should look similar to this:

      $ python3  q-csv.py
      Headers: [['Pennsylvania', ' Average Temperature', ' January-December'], ['Units: Degrees Fahrenheit'], ['Base Period: 1901-2000'], ['Missing: -99'], ['Date', 'Value', 'Anomaly']]
      
      Years: [1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]
      
      mean temperatures: [46.6, 48.0, 47.9, 49.2, 47.9, 49.2, 47.2, 47.6, 47.6, 45.3, 47.1, 48.6, 46.2, 48.5, 47.9, 47.6, 49.1, 47.0, 49.6, 46.9, 47.8, 47.5, 45.2, 48.1, 49.0, 47.0, 50.6, 48.9, 47.8, 46.0, 47.8, 46.2, 48.4, 47.9, 47.9, 48.8, 50.4, 49.2, 49.2, 48.1, 47.5, 47.8, 48.2, 49.4, 49.1, 46.0, 48.8, 48.4, 47.5, 48.5, 48.2, 49.5, 48.2, 48.2, 50.4, 47.4, 48.3, 49.1, 50.1, 48.8, 48.8, 47.9, 48.8, 46.3, 49.2, 47.2, 48.0, 46.9, 46.5, 48.0, 47.6, 47.3, 47.0, 47.6, 47.5, 48.1, 48.2, 47.2, 49.7, 48.2, 48.9, 46.8, 47.9, 46.4, 47.5, 47.3, 47.5, 47.9, 48.7, 48.2, 48.5, 48.8, 49.2, 48.1, 47.5, 50.7, 50.7, 47.6, 48.2, 48.0, 48.5, 47.6, 48.0, 51.8, 50.0, 48.5, 49.9, 50.3, 48.0, 49.1, 49.5, 50.8, 49.5, 48.8, 48.4, 50.1, 50.4, 51.8, 49.0, 47.4, 49.5, 50.9, 50.6, 49.8, 49.9]
      long-term average: 48.3576
      
      Anomalies: [-1.5, -0.1, -0.2, 1.1, -0.2, 1.1, -0.9, -0.5, -0.5, -2.8, -1.0, 0.5, -1.9, 0.4, -0.2, -0.5, 1.0, -1.1, 1.5, -1.2, -0.3, -0.6, -2.9, 0.0, 0.9, -1.1, 2.5, 0.8, -0.3, -2.1, -0.3, -1.9, 0.3, -0.2, -0.2, 0.7, 2.3, 1.1, 1.1, 0.0, -0.6, -0.3, 0.1, 1.3, 1.0, -2.1, 0.7, 0.3, -0.6, 0.4, 0.1, 1.4, 0.1, 0.1, 2.3, -0.7, 0.2, 1.0, 2.0, 0.7, 0.7, -0.2, 0.7, -1.8, 1.1, -0.9, -0.1, -1.2, -1.6, -0.1, -0.5, -0.8, -1.1, -0.5, -0.6, 0.0, 0.1, -0.9, 1.6, 0.1, 0.8, -1.3, -0.2, -1.7, -0.6, -0.8, -0.6, -0.2, 0.6, 0.1, 0.4, 0.7, 1.1, 0.0, -0.6, 2.6, 2.6, -0.5, 0.1, -0.1, 0.4, -0.5, -0.1, 3.7, 1.9, 0.4, 1.8, 2.2, -0.1, 1.0, 1.4, 2.7, 1.4, 0.7, 0.3, 2.0, 2.3, 3.7, 0.9, -0.7, 1.4, 2.8, 2.5, 1.7, 1.8]
      long-term anomaly: 0.25759999999999994
      
      
      (Actual output will wrap lines.)
    3. [5 pts] Use matplotlib to plot the yearly temperatures. Also plot a horizontal line along the y=average(mean temperatures) axis. Save the plot into a file named "meantemps.png".
    4. [5 pts] Use matplotlib to plot the anomalies. Also plot a horizontal line along the y=0 axis, and another horizontal line along the y=average(anomalies) axis. Save the plot into a file named "anomalies.png".
    The plots should look like these:
      
    ( Exact features such as color or linestyle aren't important in this question. A legend would be helpful, but isn't required. The plots may be parts of a shared figure, or may be separate plots. )
  7. Name this script "question" (or "q-url.py"). Write it to do each of the following:
    1. [5 pts] Read a URL (webpage) and save it as a binary file. The URL is:
      https://montcs.bloomu.edu/215/Images/f3
      Also print out the file's name and length, and the contents of the first 16 bytes.
    2. [5 pts] clouds Use matplotlib to open the file you downloaded and saved, as an image file. Display the image on the screen. It should look like this:
    3. [5 pts] Using matplotlib, display each of the color layers (red, green, and blue) of the image.