Knowledge seeker's blog: 2012

Tuesday, October 30, 2012

Python programming Day 05

To convert from a string to a list of characters

>>> s = 'spam'
>>> t = list(s)
>>> print t
['s', 'p', 'a', 'm']

To break a string into words

>>> s = 'pining for the fjords'
>>> t = s.split()
>>> print t
['pining', 'for', 'the', 'fjords']

To convert all elements in a list to a string

>>> t = ['pining', 'for', 'the', 'fjords']
>>> delimiter = ' ' # the delimiter is a space character, so join puts a space between words
>>> delimiter.join(t)
'pining for the fjords'

References
Allen B. Downey, Think Python: How to Think Like a Computer Scientist,
http://www.greenteapress.com/thinkpython/html/index.html

Tuesday, July 31, 2012

UNIX/Linux/Ubuntu system administration DAY 02

Linux system commands

lastb : show details of failed logins
last : display the last login and logout details of all users
uptime: display how long the system has been running
who: list of users who are currently logged in to the system
w: display information about users currently logged in to the system
telinit: change run levels (e.g. single user mode: run level 1)
netstat -rn: check the default gateway on an unix/linux machine

Useful Linux commands

apropos: search for strings that exist in each Linux command e.g. apropos printing
cat /etc/shells: list all available shells in the system
set: display all environmental settings

Managing users and groups

/etc/passwd
/etc/groups: consists of a list of groups each in a different line

Managing partitions

fdisk: mange partition table
sfdisk: manage partition table
parted: a partition manipulation program
partprobe: inform the OS of partition table changes

Manage packages using RPM

rpm -i packagename: install a package
rpm -F packagename: reinstall a package
rpm -U packagename: upgrade a package
rpm -e package name: remove a package
rpm -qa list all packages that are installed on your system

YUM component acts as the front-end package installer for RPM

yum install R R-core R-core-devel R-devel //to install R in Linux/Centos

Web systems day 06-Topic: declaration

Common <!DOCTYPE> declaration

References

http://www.w3schools.com/ accessed date 31 July 2012

Monday, July 30, 2012

IT for Users in Organisations Day 01

Ways to increase productivity

higher production
improve quality
improve efficiency
improve effectiveness
automate manual tasks
empowering staff

Ways to improve customer service

increase customer satisfaction
extend customer services
monitor customer contacts – feedback and analysis
customer service online
automate manual tasks
empowering staff

References

Melbourne Institute of Technology (Semester 2, 2012). BN105 IT for Users in Organisations Lecture Notes.

Friday, July 20, 2012

UNIX/Linux/Ubuntu system administration DAY 01

Force new users change their password at their first log-in

top command shows more details than ps command. It is like running ps permanently while allowing users to change parameters or apply filters to the output.
sar command shows the system's activities in a period of time (e.g. "> sar 10 5" get 5 samples in a 10-second interval.).
"user@linuxhost:> find . > list_of_files.txt" creates a list of files and save it into list_of_files.txt.
"user@linuxhost:> find /user/local -xdev > list_of_files_in_local_folder.txt" creates a list of files in the folder /user/local and save it ist_of_files_in_local_folder.txt.
"user@linuxhost:> grep passwd list_of_files.txt" search the location of the file passwd.
"user@linuxhost:> ps uawx | head" shows the 10 processes using most of the CPU cycles.
"user@linuxhost:> man -s 4 passwd" access information contained in section 4 of the command passwd.
Create a new user (useradd), create the password for the user (passwd username) make sure the user's home directory exists and corresponds to the entry in the /etc/passwd file, verify that the user owns his or her home directory and startup files, and that they are readable (and, in the case of the home directory, executable). E.g. chown -R username /home/username; chgrp -R username /home/username.
"user@linuxhost:> id username" to display user/group information.
"user@linuxhost:> usermod -G admin -a username" add username to group admin.
"root@linuxhost:> shutdown -P now" to shutdown the machine right away.
Modify /etc/sudoers file in order to use command sudo. E.g. EDITOR="emacs" visudo; and use the usermod command at list number 10 above.

References

Erik M. Keller, Unix/Linux Survival Guide, Charles River Media, Inc. , 2006.

Wednesday, July 18, 2012

CSS Day 01

CSS Box model

CSS ID selector example

CSS Class selector example

CSS Syntax

CSS Pseudo class example

CSS Pseudo element example

References

http://www.w3schools.com/ accessed date 18 July 2012

Thursday, June 21, 2012

Web systems day 05-Topic: JavaScript Objects

I JavaScript Objects

1. Facts about JavaScript objects

An object is just a special kind of data. An object has properties and methods.
The String object is used to manipulate a stored piece of text.
The Date object is used to work with dates and times.
The Array object is used to store multiple values in a single variable.
The Boolean object is used to convert a non-Boolean value to a Boolean value (true or false).
The Math object allows you to perform mathematical tasks.
The regular expression (RegExp) Object.

2. Properties are the values associated with an object.

3. Methods are the actions that can be performed on objects.

4. String object

5. Date object

6. Array object

An array is a special variable, which can hold more than one value, at a time.

7. Boolean object

8. Math Object

9. RegExp Object

A regular expression is an object that describes a pattern of characters.
When you search in a text, you can use a pattern to describe what you are searching for.
A simple pattern can be one single character.
A more complicated pattern can consist of more characters, and can be used for parsing, format checking, substitution and more.
Regular expressions are used to perform powerful pattern-matching and "search-and-replace" functions on text.

II. JavaScript Advance

The Navigator object contains information about the visitor's browser.
A cookie is often used to identify a user.
A cookie is a variable that is stored on the visitor's computer. Each time the same computer requests a page with a browser, it will send the cookie too. With JavaScript, you can both create and retrieve cookie values.

1. JavaScript Browser Detection

The Navigator Object

2. JavaScript Cookies

References

Wendy Willard (2010). Web Design DeMYSTiFieD. McGraw-Hill Osborne Media

http://www.w3schools.com/ accessed date 21 June 2012

Wednesday, June 20, 2012

Web systems day 04-Topic: JavaScript

JavaScript Popup Boxes
JavaScript has three kind of popup boxes: Alert box, Confirm box, and Prompt box

An alert box is often used if you want to make sure information comes through to the user.
A confirm box is often used if you want the user to verify or accept something.
A prompt box is often used if you want the user to input a value before entering a page.

The following shows an example of an alert box.

The following shows an example of a confirm box.

The following shows an example of prompt box.

JavaScript Functions

Calling a Function with Arguments

Functions With a Return Value

JavaScript For...In Statement

JavaScript onMouseOver Event

The try...catch Statement
The try...catch statement allows you to test a block of code for errors. The try block contains the code to be run, and the catch block contains the code to be executed if an error occurs.

References

Wendy Willard (2010). Web Design DeMYSTiFieD. McGraw-Hill Osborne Media
http://www.w3schools.com/ accessed date 20 June 2012

Tuesday, June 19, 2012

Web systems day 03-Topic: JavaScript

JavaScript was invented by Brendan Eich at Netscape (with Navigator 2.0), and has appeared in all browsers since 1996.

Facts about JavaScript

JavaScript is an Object Based Programming language. Meaning that, it allows you to define your own objects and make your own variable types.
JavaScript is a sequence of statements to be executed by the browser
JavaScript was designed to add interactivity to HTML pages
JavaScript is a scripting language
A scripting language is a lightweight programming language
JavaScript is usually embedded directly into HTML pages
JavaScript is an interpreted language (means that scripts execute without preliminary compilation)
Everyone can use JavaScript without purchasing a license
JavaScript is case sensitive
Single line comments start with //.
Multi line comments start with /* and end with */.

Uses of JavaScript

Text animation
Graphic animation
Simple browser based application
HTML forms submission
Client-side forms data validation (relieving the server of this task)
Web site navigation
JavaScript can manipulate HTML elements - A JavaScript can read and change the content of an HTML element
JavaScript can be used to detect the visitor's browser - A JavaScript can be used to detect the visitor's browser, and - depending on the browser - load another page specifically designed for that browser
JavaScript can be used to create cookies - A JavaScript can be used to store and retrieve information on the visitor's computer

Employ JavaScript in your HTML documents

Create an external JavaScript (.js) file

collect commonly used functions together into external function libraries on the server

added benefit of privacy from curious users

Insert a JavaScript section that is in-line with your HTML documents
as an expression for an HTML tag attribute
within some HTML tags as Event Handlers

Manipulating HTML Elements

JavaScript is typically used to manipulate existing HTML elements.
The HTML "id" attribute is used to identify HTML elements.
To access an HTML element from a JavaScript, use the document.getElementById() method.
The document.getElementById() method will access the HTML element with the specified id.

Write Directly into The HTML Document

The example below writes a <p> element into the HTML document:

JavaScript Functions and Events

The JavaScript statement in the example above, is executed when the page loads, but that is not always what we want.
Sometimes we want to execute a JavaScript when an event occurs, such as when a user clicks a button.
Then we put the script inside a function.
Functions are normally used in combination with events.

JavaScript Functions in <head>

The example below calls a function when a button is clicked:

JavaScript Functions in <body>

This example also calls a function when a button is clicked, but the script is placed at the bottom of the page:

You can place an unlimited number of scripts in your document, and you can have scripts in both the body and the head section at the same time.

It is a common practice to put all functions in the head section, or at the bottom of the page. This way they are all in one place and do not interfere with page content.

Using an External JavaScript

JavaScript can also be placed in external files.
External JavaScript files often contain code to be used on several different web pages.
External JavaScript files have the file extension .js.
Note: External script cannot contain the <script></script> tags!
To use an external script, point to the .js file in the "src" attribute of the <script> tag:

JavaScript Variables

Variable names are case sensitive (y and Y are two different variables)
Variable names must begin with a letter, the $ character, or the underscore character
You declare JavaScript variables with the var keyword. E.g. var firstname;
To assign a value to the variable, use the equal (=) sign. E.g. firstname="Tom";
When you assign a text value to a variable, put quotes around the value.
When you assign a numeric value to a variable, do not put quotes around the value, if you put quotes around a numeric value, it will be treated as text.
If you redeclare a JavaScript variable, it will not lose its value.
Variables declared outside a function, become GLOBAL, and all scripts and functions on the web page can access it.
If you assign values to variables that have not yet been declared, the variables will automatically be declared as global variables.

JavaScript Arithmetic

As with algebra, you can do arithmetic operations with JavaScript variables.

The + Operator Used on Strings

The + operator can also be used to add string variables or text values together.

JavaScript Comparison and Logical Operators

JavaScript If...Else Statements

References

Wendy Willard (2010). Web Design DeMYSTiFieD. McGraw-Hill Osborne Media
http://www.w3schools.com/ accessed date 19 June 2012

Sunday, June 17, 2012

Operating Systems Day 08 File Manager

Files are made up of records. Records consist of fields.

File Manager is also called the file management system. It is software responsible for creating, deleting, modifying and controlling access to files. It also manages the resources used by the files.

File Manager performs its functions in collaboration with the Device Manager.

Responsibilities of File Manager include the following.

1. File storage tracking

Determine where and how files are stored

2. Implement storage management policies

Efficiently use available storage space
Provide efficient file access

3. File allocation: open() system call

Record file use

4. File access: read() system call

Accesses and returns data and tracks position in file

5. File de-allocation: close() system call

Return file to storage and make available for others

Field is a group of related bytes that can be identified by the user with a name, type, and size.

Record is a group of related fields.

File is a group of related records that contains information used by specific application programs to generate reports.

Database is a group of related files that are interconnected at various levels to give users flexibility of access to the stored data.

A program file contains instructions and a data file contains data.

Directories are special files with listings of filenames and their attributes.

Record format

Within each file, records are all presumed to have the same format. They may be (1) fixed length or (2) variable length.

whereas the variable-length formats are used in files accessed sequentially (text files, program files), or in files that use an index to access records.

Fixed-length record

Defined number of characters
Direct access is easy - use a formula to calculate byte position
Record size is critical

With fixed length fields, long fields are truncated

Ideal for simple data files

Variable-length record

Direct access: difficult
No empty storage space and no character truncation
File descriptor stores record format
Used with files accessed sequentially - text or program files
Used for files with index to access records
Cannot be used with Random (Direct) files

References

McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning.
Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Friday, June 15, 2012

Web systems day 02-Topics: Designing for screens

Typical screen areas for common monitor sizes

Popular and widely supported web fonts

If you’re creating webpages for the general public, a good rule of thumb is to limit your main content pages to around 200K in size. HTML and CSS files typically weigh in at around 1 to 3K, so that leaves the vast majority of the total for other content such as graphics.

In fact, only 216 colors will display uniformly across all Mac and Windows systems (but even those colors can be affected by lighting and gamma issues). Those 216 colors are commonly referred to as “web-safe” colors.

Eye-tracking research

The above example is from a website's "About Us" pages. The heatmap clearly shows users' tendency to read in an "F" pattern, and their focus on information that's presented in bulleted lists. In this case, there's also a small amount of attention to the "see also" area, but no viewing of the promotions in the rightmost column.

Another eye-tracking research result

References

Wendy Willard (2010). Web Design DeMYSTiFieD. McGraw-Hill Osborne Media

http://www.useit.com/eyetracking/ accessed date 15 June 2012

Wednesday, June 13, 2012

Operating Systems Day 07

Seek time: the time required to position the read/write head on the proper track from the time the I/O request is issued.

First-come, first-served (FCFS): the simplest scheduling algorithm for direct access storage devices that satisfies track requests in the order in which they are received.

Shortest Seek Time First (SSTF): a scheduling strategy for direct access storage devices that’s used to optimize seek time. The track requests are ordered so the one closest to the currently active track is satisfied first and the ones farthest away are made to wait.

SCAN is a seek strategy in which it uses a directional bit to indicate whether the arm is moving toward the center of the disk or away from it. In this technique, the algorithm moves the arm methodically from the outer to the inner track, servicing every request in its path. When it reaches the innermost track, it reverses direction and moves toward the outer tracks, again servicing every request in its path. In addition, SCAN strategy moves to extreme tracks even when there is no need.

LOOK is a seek strategy which is the most common variation of SCAN. In this strategy, the arm does not necessarily go all the way to either edge unless there are requests, hence eliminating the possibility of indefinite postponement.

Direct Memory Access (DMA) technique that allows a control unit to access main memory directly and transfer data without the intervention of the CPU. This technique is used for high-speed devices such as disks. Without DMA, the CPU is responsible for the physical movement of data between main memory and the device, a time-consuming task that results in significant overhead and decreased CPU utilization.

1. FCFS effective with light loads

Service time unacceptably long under high loads

2. SSTF effective with moderate loads

Localization problem under heavy loads

3. SCAN effective with light to moderate loads

Eliminates indefinite postponement

4. C(Cicular)-SCAN effective with moderate to heavy loads

Very small service time variances

References

McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning.
Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Tuesday, June 5, 2012

Web systems day 01-Topics: Ajax

What is the goal of the use of Ajax (Asynchronous JavaScript + XML) in a Web application?

Goal of Ajax is to provide Web-based applications with responsiveness approaching that of desk-top applications.

Non-Ajax applications often have only one or a few server-side response programs, each of which produces significant content.

Ajax applications often have a much larger number of response programs, each of which is small and handles only requests for changes in one small part of a web page (e.g. Google maps web app).

Ajax applications often return JavaScript code to the client. Such code could be modified by an intruder to include destructive operations. To protect against this, any JavaScript code returned by the server must be scanned before it is interpreted.

Ajax is much faster for web applications that require user and server interactions.

Ajax requests and receives a small part of a document, hence results in much faster responses.

Ajax does not use any new programming languages or markup languages

AJAX is a technique for creating fast and dynamic web pages.

Client side: JavaScript, XML, XHTML, DOM, CSS
Server side: any (PHP, servlets, ASP.NET, etc.)

AJAX is based on internet standards, and uses a combination of:

XMLHttpRequest object (to exchange data asynchronously with a server)
JavaScript/DOM (to display/interact with the information)
CSS (to style the data)
XML (often used as the format for transferring data)

References

Robert W. Sebesta (2010). Programming the World Wide Web (5th ed.). Pearson Education, Inc.
http://www.w3schools.com/ accessed date 7 July 2012

Sunday, May 27, 2012

Multimedia Day 06

Ethics
Copyright

Jury deals blow to Oracle v Google case: no infringement http://bit.ly/JYCzBz

Fair use
Infringement
Privacy
Digital rights management
Censorship
Intellectual property
Public domain means either that the work was never copyrighted in the first place or its copyright protection has expired over time and not been renewed; you can use public domain material without a license.

Saturday, May 19, 2012

Neural networks for time-series forecasting Day 2

Neural networks can be classified into dynamic and static categories.

Static (feedforward) networks have no feedback elements and contain no delays; the output is calculated directly from the input through feedforward connections.

In dynamic networks, the output depends not only on the current input to the network, but also on the current or previous inputs, outputs, or states of the network.

Dynamic networks can also be divided into two categories: those that have only feedforward connections, and those that have feedback, or recurrent, connections.

References

Grace Rumantir, Monash University FIT5167 Lecture Notes, 2012.
MATLAB 2011b Help Documentation

Thursday, May 17, 2012

CRM and data mining Day 10

For each algorithm, discuss what it is, its (perceived) strengths and (perceived) weaknesses.

a) (unsupervised and supervised) Classification learning
Classification learning problems refer to problems that are trying to classify unknown instances given a set of known instances.

b) Decision Trees (and Decision Graphs)
Decision Trees problems refer to problems that represent class labels in terms of a tree. For each branch of the tree there is a classification rule.

c) Naive Bayes
Naive Bayes problems refer to problems using probabilistic approach to solving classification problems.

d) Clustering
Clustering involves segmenting a data set in such a way that all members in a segment will have similar characteristics, and one segment will be different from another in terms of characteristics.

e) Segmentation
Segmentation is a method that is used to group individual instances/records in such a way that they have similar attributes.

f) k-means algorithm
k-means algorithm is used to divide a data set into many subsets of data. Each subset is often represented by a centroid. K indicates a number of subsets that are obtained.

g) Self organising maps
Self organising maps is an algorithm that group similar records according to some biological rules.

h) Memory based reasoning (and k nearest neighbour algorithm)
Memory-based reasoning and collaborative filtering (CF) are nearest neighbour approaches
Nearest neighbour techniques are based on the concept of similarity.
Memory-Based Reasoning (MBR) results are based on similar situations in the past.

Strengths

Ability to adapt
Good results without lengthy training
Ability to use data "as is", meaning that the format of records won't create a problem. Be able to utilize both a distance function and a combination function between data records to help determine how similar they are.

Weaknesses

Many samples are required.
Deciding a 'good' distance metric

i) Market basket analysis

j) Association rules

k) Neural networks

l) Recommender systems
Systems that make recommendations

m) Collaborative filtering systems
Recommender systems based on ratings of people who have made similar ratings on items the user has rated.

n) Content-based filtering
Recommender systems based on properties of the items and of the user.

o) Supervised learning
Supervised learning is a computer learning approach where it is presented a dataset with class labels. The computer program then tries to derive a rule/pattern for each class label.

p) Unsupervised learning
Unsupervised learning is a computer learning approach where it is presented a dataset without class labels, and try to segment it into different labelled segments.

q) Lift
A rule interestingness measure
The value of confidence(L → R) measures the support for R if we only examine the transactions that match L. So purchasing the items in L makes it 0.864/0.125 = 6.91 times more likely that
the items in R are purchased.

Lift values greater than 1 are ‘interesting’. They indicate that transactions containing L tend to contain R more often than transactions that do not contain L.

Although lift is a useful measure of interestingness it is not always the best one to use. In some cases a rule with higher support and lower lift can be more interesting than one with lower support and higher lift because it applies to more cases.

r) Leverage
A rule interestingness measure

Another measure of interestingness that is sometimes used is leverage. It measures the difference between the support for L ∪ R (i.e. the items in L and R occurring together in the database) and the support that would be expected if L and R were independent.

The former is just support(L ∪R). The frequencies (i.e. supports) of L and R are support(L) and support(R), respectively. If L and R were independent the expected frequency of both occurring in the same transaction would be the product of support(L) and support(R).

This gives a formula for leverage:

leverage(L → R) = support(L ∪ R) − support(L) × support(R).

The value of the leverage of a rule is clearly always less than its support.

s) EM clustering algorithm
Expectation Maximisation (EM) clustering algorithm employs ...

t) What are the two main definitions of a Data Warehouse according to W. H. Inmon and Ralph Kimball?

According to W.H. Inmon "Data Warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process"

According to Ralph Kimball "Data Warehouse is a copy of transaction data, specifically structured for query and analysis."

u) Describe the following concepts related to data warehouse design:

(a) granularity
(b) data partitioning
(c) subject orientation

v) Define Dimension tables and Fact tables in a dimensional data warehouse model.
Dimension tables in a dimensional data warehouse model refer to

Fact tables in a dimensional data warehouse model consists of measures.

w) How is a dimensional model different from the Enterprise data warehouse model suggested by Inmon?

References

FIT5158 Monash University Lecture Notes, 2012

Tuesday, May 15, 2012

CRM and data mining Day 09

For cluster analysis, the question is how to evaluate the “goodness” of the resulting clusters?

But “clusters are in the eye of the beholder”!

Then why do we want to evaluate them?

To compare clustering algorithms
To avoid finding patterns in noise
To compare two sets of clusters
To compare two clusters

Different Aspects of Cluster Validation

Determining the clustering tendency of a set of data, i.e., distinguishing whether non-random structure actually exists in the data.
Comparing the results of a cluster analysis to externally known results, e.g., to externally given class labels.
Evaluating how well the results of a cluster analysis fit the data without reference to external information.

Use only the data

Comparing the results of two different sets of cluster analyses to determine which is better.
Determining the ‘correct’ number of clusters.

For 2, 3, and 4, we can further distinguish whether we want to evaluate the entire clustering or just individual clusters.

Measures of Cluster Validity

Internal Measures: Cohesion and Separation

Cluster Cohesion: measures how closely related are objects in a cluster.

Example: SSE

Cluster Separation: measure how distinct or well-separated a cluster is from other clusters.

Separation is measured by the between cluster sum of square

Silhouette Coefficient combines ideas of both cohesion and separation, but for individual points, as well as clusters and clusterings.

For an individual point, i

Calculate a = average distance of i to the points in its cluster

Calculate b = min (average distance of i to points in another cluster)

The silhouette coefficient for a point is then given by

s = 1 – a/b if a < b,

s = b/a - 1 if a > b, not the usual case

Silhouette coefficient is typically between 0 and 1. The closer to 1 the better.

Can calculate the Average Silhouette width for a cluster or a clustering.

Thursday, May 10, 2012

Operating Systems Day 06

Problems that arise when many processes compete for relatively few resources and the system is unable to service all of the processes. In particular, there are two extreme conditions: deadlock and starvation which result from a lack of process synchronization.

Problems may arise if the system is unable to service all of the processes in a synchronous manner.

Deadlock is more serious than starvation because it affects more than one job. There are seven cases of deadlock.

Case 1: Deadlocks on File Requests. Deadlock from multiple process file requests occurs because of jobs being allowed to hold files for the duration of their executions.
Case 2: Deadlocks in Databases. Deadlock may occur if two processes access and lock records in a database. The concept of locking within a database whereby one user locks out all other users while working with the database.
Case 3: Deadlocks in Dedicated Device Allocation. Deadlock may occur if there are a limited number of dedicated devices and many more processes requesting them.
Case 4: Deadlocks in Multiple Device Allocation. Deadlock may occur with multiple device allocations utilizing different device types when several processes request and hold dedicated devices while other processes act in a similar manner.
Case 5: Deadlocks in Spooling. Deadlock may occur if a sharable device, such as a spooling printer that requires all job output before initiating printing, encounters a full spool disk area with only partially completed output residing on it.
Case 6: Deadlocks in a Network. Deadlock in a network resulting from the absence of network protocols to control message flow.
Case 7: Deadlocks in Disk Sharing.

Deadlock occurs due to over-commitment of resources. Four conditions that need to be simultaneously present for deadlock to occur: mutual exclusion, resource holding, no preemption, and circular wait. Each of these four individual conditions is necessary for the operating system to work smoothly, and none of them can be removed easily without causing the system’s overall functioning to suffer. Therefore, the system needs to recognize the combination of conditions before they occur and threaten to cause the system to lock up.

Starvation occurs when allocation of resources is restricted.

When a starvation occurs, a job never completes because a requested resource is not allocated.

Causes of Deadlock

Deadlock occurs due to over-commitment of resources

There are four methods for dealing with deadlocks: prevention, avoidance, detection, recovery.

Directed graphs are used to model the four simultaneous conditions necessary for deadlock.

In directed graphs, a solid arrow from a resource to process represents process is holding the resource.

Avoidance: the dynamic strategy of deadlock avoidance that attempts to ensure that resources are never allocated in such a way as to place a system in an unsafe state.

Circular wait: one of four conditions for deadlock through which each process involved is waiting for a resource being held by another; each process is blocked and can’t continue, resulting in deadlock.

Mutual exclusion: one of four conditions for deadlock in which only one process is allowed to have access to a resource.

Process synchronization: (1) the need for algorithms to resolve conflicts between processors in a multiprocessing environment; or (2) the need to ensure that events occur in the proper order even if they are carried out by several processes.

Race: a synchronization problem between two processes vying for the same resource.

Starvation: the result of conservative allocation of resources in which a single job is prevented from execution because it’s kept waiting for resources that never become available.

The following figure shows a deadlock event using a directed graph.

Use Directed graphs

Circles represent processes
Squares represent resources
Solid arrow from resource to process

Process holding resource

Solid arrow from a process to resource

Process waiting for resource

Arrow direction indicates flow
Dashed arrow indicates Process requests Resource

Cycle in graph indicates deadlock condition

Point out that deadlock results from the liberal allocation of resources, while starvation results from conservative allocation of resources.

Strategies for handling deadlocks

Prevention: prevent one of four neccessary deadlock conditions (i.e. no preemption, circular wait, mutual exclusion, resource holding) from occuring
Avoidance
Detection

References

McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning.
Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Multimedia Day 05

Animation is possible because of persistence of vision.
Keyframes show the start and end of some action
The process of filling in the action is called tweening.
Morphing is the process of transitioning from one image to another
Kinematics is the study of movement of objects that have joints, such as people.

Monday, May 7, 2012

Neural networks for time-series forecasting Day 1

Time series data can be decomposed into four possible components:

the Trend component reflecting the long term progression of the series.
the Cyclical component that describes repeated but non-periodic fluctuations, possibly caused by the economic cycle.
the Seasonal component reflecting seasonality (Seasonal variation)
the Irregular component (or "noise") that describes random, irregular influences. Compared to the other components it represents the residuals of the time series.

Time series forecasting is the use of a model to forecast future events based on known past events. Meaning that, it is used to predict data points before they are measured.

Autocorrelation is the correlation of the value of a data item at a particular time with the values of previous data items.

The number of time steps back into the past that have data items with high autocorrelation to the future data item is called time lags.

Given the water demand data of the previous 12 months as input, a multi-layer perceptron model can be built to forecast the value of the random component of the water demand in the following month.

Spatio-temporal model not only incorporates time lags of the variable whose future values are of interest, but also other influential variables, and possibly their own lags. For example, for the water demand time series data, rainfall is seen as a lag indicator of water demand.

The output of the MLP is added to the extrapolations of the trend and seasonal components to get the forecast of water demand for the following month.

Perceptron for forecasting of linear time-series

Linear Auto-Regressive with Exogenous (external) input variables networks (ARx)

Neural networks for non-linear time-series forecasting

Time-lagged feedforward networks (sometimes are also called focused time-lagged networks)
Dynamically-driven recurrent (feedback) networks

A recurrent network feeds back the predicted output through feedback connection into the input layer with one unit time delay.

A Non-linear Auto-Regressive with Exogenous input variables (NARx) network model is a recurrent network that, on top of the variable being predicted, uses other external/exogenous variables as inputs.

MLP for forecasting Multi-step-ahead forecasting

A Non-linear Auto-Regressive with Exogenous (external) input variables networks (NARx).

The nonlinear autoregressive network with exogenous inputs (NARX) is a recurrent dynamic network, with feedback connections enclosing several layers of the network.

Single step vs. Multi-step-ahead forecasting

Time-lagged and recurrent networks

Performance of ARx and NARx networks

In Spatio-Temporal Time-Lagged networks, the response of the time-series is affected by one or more time-series.

Dynamic networks can be divided into two categories: those that have only feedforward connections, and those that have feedback, or recurrent connections.

Recurrent-dynamic networks typically have a longer response than feedforward-dynamic networks.

For linear networks, feedforward-dynamic networks are called finite impulse response (FIR), because the response to an impulse input will become zero after a finite amount of time.

Linear recurrent-dynamic networks are called infinite impulse response (IIR), because the response to an impulse can decay to zero (for a stable network), but it will never become exactly equal to zero.

An impulse response for a nonlinear network cannot be defined, but the ideas of finite and infinite responses do carry over.

Dynamic networks are generally more powerful than static networks (although somewhat more difficult to train). Because dynamic networks have memory, they can be trained to learn sequential or time-varying patterns.

References

Grace Rumantir (2012). FIT5167 Lecture Notes. Monash University
MATLAB 2011b Help Documentation

Thursday, April 26, 2012

Operating Systems Day 05

Multiprogramming: a technique that allows a single processor to process several programs residing simultaneously in main memory and interleaving their execution by overlapping I/O requests with CPU requests.

Nonpreemptive scheduling policy: a job scheduling strategy that functions without external interrupts so that once a job captures the processor and begins execution, it remains in the running state uninterrupted until it issues an I/O request or it’s finished.

Preemptive scheduling policy: any process scheduling strategy that, based on predetermined policies, interrupts the processing of a job and transfers the CPU to another job. It is widely used in time-sharing environments.

Nonpreemptive scheduling policy includes the following process scheduling schemes:

First Come First Serve (FCFS)
Shortest Job Next (SJN)
Priority scheduling

Preemptive scheduling policy includes the following process scheduling schemes:

Shortest Remaining Time (SRT)
Round Robin
Round Robin with Priority

The Operating System tracks jobs by means of Process Control Blocks (PCB’s). Each process has its own PCB.

Process Control Block (PCB) components

Process identification (PID)

Unique

Process status

Status (HOLD, READY, RUNNING, WAITING)

Process state

Process status word register contents, main memory info (segment table, page table), resources (files), process priority

Accounting

Billing and performance measurements
CPU time, total time, memory occupancy, I/O operations, number of input records read, etc.

Multiple-level queues are used for both Nonpreemptive and Preemptive policies.

Interrupts usually temporarily suspend execution of a process

The following figure shows the queuing paths from HOLD to FINISHED process states.

Good process scheduling policy criteria are

Maximise throughput: Run as many jobs as possible
Minimise response time: Quickly handle interactive requests
Minimise turnaround time: Move job through system quickly
Minimise waiting time: Move job out of READY queue quickly
Maximise CPU efficiency: Keep CPU busy 100 percent of time
Ensure fairness for all jobs: Give every job equal CPU and I/O time

References

McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning.
Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Thursday, April 19, 2012

Operating Systems Day 04

Virtual memory: a technique that allows programs to be executed even though they are not stored entirely in memory.

Virtual memory’s important advantages such as

A job’s size is no longer restricted to the size of main memory.
It eliminates external fragmentation and minimizes internal fragmentation.
It allows the sharing of code and data.
It facilitates dynamic linking of program segments, and an unlimited amount of multiprogramming is possible.

The disadvantages of virtual memory include

increased processor hardware costs,
increased overhead for handling paging interrupts, and
increased software complexity to prevent thrashing.

Cache memory: a small, fast memory used to hold selected data and to provide faster access than would otherwise be possible.

Thrashing: a phenomenon in a virtual memory system where an excessive amount of page swapping back and forth between main memory and secondary storage results in higher overhead and little useful work.

Memory Map Table (MMT): a table in main memory that contains as many entries as there are page frames and lists the location and free/busy status for each one.

Page: a fixed-size section of a user’s job that corresponds in size to page frames in main memory.

Paged memory allocation: a memory allocation scheme based on the concept of dividing a user’s job into sections of equal size to allow for noncontiguous program storage during execution.

Demand paging: a memory allocation scheme that loads a program’s page into memory at the time it is needed for processing.

Page Map Table (PMT): a table in main memory with the vital information for each page including the page number and its corresponding page frame memory address.

Page fault: a type of hardware interrupt caused by a reference to a page not residing in memory. The effect is to move a page out of main memory and into secondary storage so another page can be moved into memory.

Working set: a collection of pages to be kept in main memory for each active process in a virtual memory environment.

Processor Manager allocates CPU among all users.

Processor Manager manages and allocates a single central processing unit (CPU) to execute the incoming jobs.

The difference between job scheduling and process scheduling, and how they relate.

The advantages and disadvantages of process scheduling algorithms that are preemptive versus those that are nonpreemptive.

Up to six different process scheduling algorithms.

Explain the two states in a single-user system (busy and idle).

Understand the difference between a job and a process.

Explanations of the terms program or job, process, thread, processor, interrupt, higher priority, and context switch.

Program (Job)

instructions defining job to be performed, usually stored as a file
a ‘unit of work’ submitted by a user
passive entity

Process (task)

single instance of a running program
active entity

Requires resources to perform function
CPU + registers + memory (code & data) + devices (files…)

Thread

‘lightweight’ process
runs independently

Processor

Central Processing Unit (CPU)
performs calculations and executes programs (code)

Interrupt

Notification of important event
Activates higher-priority program (interrupt handler)
Supported by CPU hardware…

Context Switch

Occurs when changing from one process to another
Saves process information (registers) when interrupted

Job is divided into a set of pages with a fixed size for each page.

Each frame in the main memory has the same size as each page?

Physical address = starting address of the frame + the offset

For 16-bit address

The first 6 bits are for identifying the page number.

The last 10 bits are for offset.

Objective: Minimise Page Faults

Replace a page that isn’t going to be wanted again
Avoid having to save the page

Algorithms:

First In - First Out (FIFO):

Simple queue, oldest frame at head of the queue

Least Recently Used (LRU):

List in time sequence
This algorithm works on the following principle: when a page is referenced, a referenced bit is set for that page, marking it as referenced. Similarly, when a page is modified (written to), a modified bit is set. The setting of the bits is usually done by the hardware, although it is possible to do so on the software level as well.
At a certain fixed time interval, the clock interrupt triggers and clears the referenced bit of all the pages, so only pages referenced within the current clock interval are marked with a referenced bit. When a page needs to be replaced, the operating system divides the pages into four classes:

not referenced, not modified
not referenced, modified
referenced, not modified
referenced, modified

Although it does not seem possible for a page to be not referenced yet modified, this happens when a class 3 page has its referenced bit cleared by the clock interrupt. The NRU algorithm picks a random page from the lowest category for removal. Note that this algorithm implies that a modified (within clock interval) but not referenced page is less important than a not modified page that is intensely referenced.

Least Frequently Used (LFU):

List by frequency of use

Working set:

The collection of pages a process is using actively
Must be memory-resident to prevent thrashing
It changes over process execution
Golden rule: never remove from memory pages of the working set

Cache memory: small high-speed intermediate memory unit

Cache memory is not accessible by the programmer
Performance of computer system usually increased
Memory access time usually significantly reduced
Faster processor access compared to memory
Stores frequently used data and instructions
Two levels of cache

L2: Connected to CPU; contains copy of bus data
L1: Pair built into CPU; one stores instructions and one data

Four cache memory design factors:

Cache size, block size, block replacement method, rewrite policy
Optimal selection of cache and replacement method necessary
May lead to 80-90% of all requests in cache

Efficiency measures

Cache hit ratio (h)

Percentage of total cache hits

Miss ratio (1-h)
Average memory access time

AvgCacheAccessTime + (1-h) * AvgMemACCTime

Cache transfer method

CPU puts the address of a memory location in the Memory Address Register and requests data or an instruction to be retrieved from that address
A test is performed to determine if the block containing this address is already in a Cache slot

YES (hit): Transfer the information to CPU register – DONE

Much faster than accessing main memory

NO (miss): Get the block containing that address from Memory

Allocate a free Cache block slot to the block
Perform these in parallel: Transfer the data to CPU

Load the block into slot - DONE

A bit slower than accessing main memory

The segmented approach to memory management

Programs often divide memory into areas

Code: may have several subroutines or modules
Data: may be organised into variables tables, lists…

Overheads

Each segment must have own base and limit registers
Each segment has its own page table

The segmented memory allocation scheme, in which each job is divided into several segments of different sizes, one for each module that contains pieces that perform related functions. Note that it is based upon the concept of programming modules.

The notion of placing modules in different sized segments. Explain how this concept reduces page faults using the loop with a program example. Note that this is fundamentally different from a paging scheme that uses pages of the same size. Also note that memory is allocated dynamically.

The role of the Segment Map Table (SMT) and the information contained in it, such as segment numbers, their lengths, access rights, status, and (when each is loaded into memory) its location in memory.

References

McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning.
Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Tuesday, April 17, 2012

CRM and data mining Day 08

The following figure shows data mining at the intersection of many disciplines.

The following figure shows different data types that we encounter in data mining.

The following figure shows a taxonomy of data mining.

The following figures shows the data mining process.

The following figures shows the data preparation process.

The following figures shows the data mining (SEMMA) process.

Split the data into 2 mutually exclusive sets training (~70%) and testing (30%).

For ANN, the data is split into three sub-sets (training [~60%], validation [~20%], testing [~20%]).

The following figures shows the ROC curve.

Gini index determines the purity of a specific class as a result of a decision to branch along a particular attribute/value (e.g.

CART)

Information gain uses entropy to measure the extent of uncertainty or randomness of a particular attribute/value split (e.g.

ID3, C4.5, C5)

Algorithms are available for generating association rules.

Apriori
Eclat
FP-Growth
Derivatives and hybrids of the three

The following figure shows biological versus artificial neural networks.

References

FIT5158 Monash University Lecture Notes, 2012
Andy Oppel (2011), Database Demystified, 2nd Ed, McGraw-Hill.
Efrain Turban et. al. (2011), Business Intelligence, A Managerial Approach, 2nd Ed, Prentice Hall.