Thursday, April 26, 2012

Operating Systems Day 05

Multiprogramming: a technique that allows a single processor to process several programs residing simultaneously in main memory and interleaving their execution by overlapping I/O requests with CPU requests.

Nonpreemptive scheduling policy: a job scheduling strategy that functions without external interrupts so that once a job captures the processor and begins execution, it remains in the running state uninterrupted until it issues an I/O request or it’s finished.

Preemptive scheduling policy: any process scheduling strategy that, based on predetermined policies, interrupts the processing of a job and transfers the CPU to another job. It is widely used in time-sharing environments.

Nonpreemptive scheduling policy includes the following process scheduling schemes:
  1. First Come First Serve (FCFS)
  2. Shortest Job Next (SJN)
  3. Priority scheduling
Preemptive scheduling policy includes the following process scheduling schemes:
  1. Shortest Remaining Time (SRT)
  2. Round Robin
  3. Round Robin with Priority
The Operating System tracks jobs by means of Process Control Blocks (PCB’s). Each process has its own PCB.

Process Control Block (PCB) components
  • Process identification (PID)
    • Unique 
  • Process status
    • Status (HOLD, READY, RUNNING, WAITING)
  • Process state
    • Process status word register contents, main memory info (segment table, page table), resources (files), process priority
  • Accounting
    • Billing and performance measurements
    • CPU time, total time, memory occupancy, I/O operations, number of input records read, etc.
Multiple-level queues are used for both Nonpreemptive and Preemptive policies.

Interrupts usually temporarily suspend execution of a process

The following figure shows the queuing paths from HOLD to FINISHED process states.


Good process scheduling policy criteria are
  1. Maximise throughput: Run as many jobs as possible
  2. Minimise response time: Quickly handle interactive requests
  3. Minimise turnaround time: Move job through system quickly
  4. Minimise waiting time: Move job out of READY queue quickly
  5. Maximise CPU efficiency: Keep CPU busy 100 percent of time
  6. Ensure fairness for all jobs: Give every job equal CPU and I/O time
References
  1. McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning.
  2. Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Thursday, April 19, 2012

Operating Systems Day 04

Virtual memory: a technique that allows programs to be executed even though they are not stored entirely in memory.

Virtual memory’s important advantages such as
  • A job’s size is no longer restricted to the size of main memory.
  • It eliminates external fragmentation and minimizes internal fragmentation.
  • It allows the sharing of code and data.
  • It facilitates dynamic linking of program segments, and an unlimited amount of multiprogramming is possible.
The disadvantages of virtual memory include 
  • increased processor hardware costs, 
  • increased overhead for handling paging interrupts, and 
  • increased software complexity to prevent thrashing.
Cache memory: a small, fast memory used to hold selected data and to provide faster access than would otherwise be possible.

Thrashing: a phenomenon in a virtual memory system where an excessive amount of page swapping back and forth between main memory and secondary storage results in higher overhead and little useful work.

Memory Map Table (MMT): a table in main memory that contains as many entries as there are page frames and lists the location and free/busy status for each one.

Page: a fixed-size section of a user’s job that corresponds in size to page frames in main memory.

Paged memory allocation: a memory allocation scheme based on the concept of dividing a user’s job into sections of equal size to allow for noncontiguous program storage during execution.

Demand paging: a memory allocation scheme that loads a program’s page into memory at the time it is needed for processing.

Page Map Table (PMT): a table in main memory with the vital information for each page including the page number and its corresponding page frame memory address.

Page fault: a type of hardware interrupt caused by a reference to a page not residing in memory. The effect is to move a page out of main memory and into secondary storage so another page can be moved into memory.

Working set: a collection of pages to be kept in main memory for each active process in a virtual memory environment.

Processor Manager allocates CPU among all users.

Processor Manager manages and allocates a single central processing unit (CPU) to execute the incoming jobs.
  • The difference between job scheduling and process scheduling, and how they relate.
The advantages and disadvantages of process scheduling algorithms that are preemptive versus those that are nonpreemptive.
Up to six different process scheduling algorithms.
Explain the two states in a single-user system (busy and idle).
Understand the difference between a job and a process.
Explanations of the terms program or job, process, thread, processor, interrupt, higher priority, and context switch.

Program (Job)
  • instructions defining job to be performed, usually stored as a file
  • a ‘unit of work’ submitted by a user
  • passive entity
Process (task)
  • single instance of a running program
  • active entity
    • Requires resources to perform function
    • CPU + registers + memory (code  & data) + devices (files…)
Thread
  • ‘lightweight’ process
  • runs independently
Processor
  • Central Processing Unit (CPU)
  • performs calculations and executes programs (code)
Interrupt
  • Notification of important event
  • Activates higher-priority program (interrupt handler)
  • Supported by CPU hardware…
Context Switch
  • Occurs when changing from one process to another
  • Saves process information (registers) when interrupted
Job is divided into a set of pages with a fixed size for each page.
Each frame in the main memory has the same size as each page?

Physical address = starting address of the frame + the offset 

For 16-bit address
The first 6 bits are for identifying the page number.
The last 10 bits are for offset.

Objective: Minimise Page Faults
  • Replace a page that isn’t going to be wanted again
  • Avoid having to save the page
Algorithms:
  • First In - First Out (FIFO):
    • Simple queue, oldest frame at head of the queue
  • Least Recently Used (LRU):
    • List in time sequence
    • This algorithm works on the following principle: when a page is referenced, a referenced bit is set for that page, marking it as referenced. Similarly, when a page is modified (written to), a modified bit is set. The setting of the bits is usually done by the hardware, although it is possible to do so on the software level as well.
    • At a certain fixed time interval, the clock interrupt triggers and clears the referenced bit of all the pages, so only pages referenced within the current clock interval are marked with a referenced bit. When a page needs to be replaced, the operating system divides the pages into four classes:
      • not referenced, not modified
      • not referenced, modified
      • referenced, not modified
      • referenced, modified
    • Although it does not seem possible for a page to be not referenced yet modified, this happens when a class 3 page has its referenced bit cleared by the clock interrupt. The NRU algorithm picks a random page from the lowest category for removal. Note that this algorithm implies that a modified (within clock interval) but not referenced page is less important than a not modified page that is intensely referenced.
  • Least Frequently Used (LFU):
    • List by frequency of use
Working set:
  • The collection of pages a process is using actively
  • Must be memory-resident to prevent thrashing
  • It changes over process execution
  • Golden rule: never remove from memory pages of the working set
Cache memory: small high-speed intermediate memory unit
  • Cache memory is not accessible by the programmer
  • Performance of computer system usually increased
  • Memory access time usually significantly reduced
  • Faster processor access compared to memory
  • Stores frequently used data and instructions
  • Two levels of cache
    • L2: Connected to CPU; contains copy of bus data
    • L1: Pair built into CPU; one stores instructions and one data
Four cache memory design factors:
  • Cache size, block size, block replacement method, rewrite policy
  • Optimal selection of cache and replacement method necessary
  • May lead to 80-90% of all requests in cache
Efficiency measures
  • Cache hit ratio (h) 
    • Percentage of total cache hits
  • Miss ratio (1-h)
  • Average memory access time
    • AvgCacheAccessTime + (1-h) * AvgMemACCTime
Cache transfer method
  1. CPU puts the address of a memory location in the Memory Address Register and requests data or an instruction to be retrieved from that address
  2. A test is performed to determine if the block containing this address is already in a Cache slot
    1. YES (hit): Transfer the information to CPU register – DONE
      1. Much faster than accessing main memory
    2. NO (miss): Get the block containing that address from Memory
      1. Allocate a free Cache block slot to the block
      2. Perform these in parallel: Transfer the data to CPU
        1. Load the block into slot - DONE
      3. A bit slower than accessing main memory
The segmented approach to memory management
  • Programs often divide memory into areas
    • Code: may have several subroutines or modules
    • Data: may be organised into variables tables, lists…
  • Overheads
    • Each segment must have own base and limit registers
    • Each segment has its own page table
The segmented memory allocation scheme, in which each job is divided into several segments of different sizes, one for each module that contains pieces that perform related functions. Note that it is based upon the concept of programming modules.

The notion of placing modules in different sized segments. Explain how this concept reduces page faults using the loop with a program example. Note that this is fundamentally different from a paging scheme that uses pages of the same size. Also note that memory is allocated dynamically.

The role of the Segment Map Table (SMT) and the information contained in it, such as segment numbers, their lengths, access rights, status, and (when each is loaded into memory) its location in memory.

References
  1. McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning. 
  2. Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Tuesday, April 17, 2012

CRM and data mining Day 08

The following figure shows data mining at the intersection of many disciplines.

The following figure shows different data types that we encounter in data mining.




The following figure shows a taxonomy of data mining.



The following figures shows the data mining process.


The following figures shows the data preparation process.



The following figures shows the data mining (SEMMA) process.



Split the data into 2 mutually exclusive sets training (~70%) and testing (30%).


For ANN, the data is split into three sub-sets (training [~60%], validation [~20%], testing [~20%]).

The following figures shows the ROC curve.


Gini index determines the purity of a specific class as a result of a decision to branch along a particular attribute/value (e.g.
 CART)



Information gain uses entropy to measure the extent of uncertainty or randomness of a particular attribute/value split (e.g. 
ID3, C4.5, C5)


Algorithms are available for generating association rules.
  1. Apriori
  2. Eclat
  3. FP-Growth
  4. Derivatives and hybrids of the three

The following figure shows biological versus artificial neural networks.



References
  1. FIT5158 Monash University Lecture Notes, 2012
  2. Andy Oppel (2011), Database Demystified, 2nd Ed, McGraw-Hill.
  3. Efrain Turban et. al. (2011), Business Intelligence, A Managerial Approach, 2nd Ed, Prentice Hall.

Wednesday, April 11, 2012

Operating Systems Day 03

Every Operating System has sub-system components.
  1. Memory Manager
  2. Process Manager
  3. Device Manager
  4. File Manager
  5. User Command Interface
  6. Network Manager
The roles of Memory Manager are
  1. Receives requests for memory storage from other services. 
  2. Checks validity of each request and manages allocation and release of RAM. 
  3. Protects allocated RAM from corruption.
Memory Manager employs four types of memory allocation schemes as follows.
  1. single-user systems
  2. fixed partitions
  3. dynamic partitions
  4. Relocatable dynamic partitions.
The roles of Process Manager are
  1. A program being executed is called a process.
  2. It allocates some RAM for the program
  3. The process manager allocates the CPU to each process using a Scheduler
Device Manager: monitors every device and control unit so that it can most effectively manage the device


File Manager: keeps track of every file in the system. It enforces access control policies and optimises access to improve overall performance.


User Command Interface: provides a standard set of interfaces for use by applications and an operational environment for users:
  1.  command.com
  2.  Unix Shell
  3.  Graphical user interface (GUI)
Network Manager: More advanced systems will also provide network management services


The cooperation of the above sub-components is as follows.
When a user executes a command:
  1. User Interface Manager: accepts a request from the user to run a program
  2. User Interface Manager:
    1. Identifies the required program
    2. Invokes the Process Manager to run it
  3. Process Manager
    1.  Loads the program via the File Manager into the memory allocated by the Memory Manager
  4. As the program runs it may use the File Manager and Device Manager to display output and read/write data
  5. When the program is complete, the Processor Manager signals all the other service managers to release their resources.
The amount of memory that can be addressed (used) is determined by the size of the address register.


Instructions consist of:
  1. an Operation Code (OpCode)
    1. e.g. MOV  bx  ax,    INC  cx,  HLT
    2. what action you want to do
  2. Zero of more operands
    1. what items you want perform the action on
Multiprogramming systems divide the memory into PARTITIONS so that several independent programs can be in memory at the same time.



Programs that don’t fill a partition will waste the unused space. This was common in early computer operating systems.


Dynamic partitions allocate memory on the basis of program size.


Different algorithms exist for selecting jobs:

Memory consists of “holes” (ie. unused space). Therefore, we need a method to keep track of hole sizes and locations


Policies
  1. First fitfind the first hole that is big enough
    1. Maintain hole list in ascending order of memory address
  2. Best fitfind the smallest hole that is big enough
    1. Maintain hole list in ascending order of hole size.
  3. Worst fitfind the largest hole that is big enough
    1. Maintain hole list in descending order of hole size.
Fragmentation of memory is the breaking up of memory into multiple “chunks”.  



There are two type of fragmentation of memory.
Internal Fragmentation (fixed or static partitions)
  1. Partition is larger than job needs
  2. Excess unused memory fragment cannot be used for anything else
Internal fragmentation: a situation in which a fixed partition is only partially used by the program; the remaining space within the partition is unavailable to any other job and is therefore wasted.

External Fragmentation (dynamic partitions)
  1. As memory is allocated, it leaves many small holes
  2. Memory holes are non-contiguous
  3. Sufficient memory may be available but it cannot be used.
External fragmentation: a situation in which the dynamic allocation of memory creates unusable fragments of free memory between blocks of busy, or allocated, memory.

Relocation register: a register that contains the value that must be added to each address referenced in the program so that it will be able to access the correct memory addresses after relocation.

We first need to make sure that each process has a separate memory space.

The BASE register holds the smallest legal physical memory address; the LIMIT register specifies the size of the range.

Main memory and registers are only storage CPU can access directly.


Cache sits between main memory and CPU registers.

A pair of base and limit registers define the logical address space.

Logical address–generated by the CPU; also referred to as virtual address.

The user program deals with logical addresses; it never sees the real physical addresses.

Physical memory address = logical address + relocation register

Relocation registers used to protect user processes from each other, and from changing operating-system code and data.

Limit register contains range of logical addresses–each logical address must be less than the limit register.



PAGING: A technique for memory organisation
  1. previous techniques required contiguous blocks of memory
  2. PAGING allows program to use non-contiguous memory
    1. no longer needs a single contiguous area of RAM
To use paging, the memory manager:
  1. Breaks physical memory into Frames
  2. Breaks logical memory into Pages of same size
    1. Use a Page Table to link Pages to Frames
Each process has its own page table.
Instead of loading all of program into memory frames, only load pages needed for execution.
Most programs only use a small area of code at any one time, so the number of frames used is often a lot less than the total size of the program.


Related to the idea of Virtual Memory, using disk space as an extension of memory.


Paging and the Page Table




Address translation



Thrashing: the system spends all the time swapping pages in and out of memory. It happens when a page fault causes a frame to be replaced, which must then be reloaded again straight away.








Memory management checks that computed physical address does not exceed the size of the partition.


little-endian format


References

  1. McHoes, A., & Flynn, I.M. (2008). Understanding operating systems (5th ed.). CENGAGE Learning.
  2. Melbourne Institute of Technology (Semester 1, 2012). BN104 Operating Systems Lecture Notes.

Thursday, April 5, 2012

Multimedia Day 04


Developing Flash-based animations

1. Convert image into a symbol F8
2. Go to the first keyframe and move the object to the place where you want.
3. Create motion tween between two keyframes
4. Move the first keyframe to the position that you want it to start.
5. Make sure the object continue to exist to the end of the animation F5.
6. Control menu to test movie
7. Add opacity for graduallly appearing
8. Insert Blank keyframe using right-click or F7
F6 insert a keyframe
distribute to layers for all objects

CRM and data mining Day 07


It costs six times more to sell to a new customer than to an existing one.
A typical dissatisfied customer will tell 8-10 people.
By increasing the customer retention rate by 5%, profits could increase by (e.g.) 85% (or more).
Odds of selling to new customers = 15%, as compared to those for existing customers (50%).
70% of the complaining customers will remain loyal if problem is solved.
90% of companies do not have the sales and service integration to support e-commerce.

Customers are not all "created equal".
Some customers are more valuable than others.
Therefore, for the company, not all customers should be treated equally.

The figure below shows different human life stages.




Data mining in customer life cycle.


Data Cube
The dimensions allow the store to keep track of things like monthly sales of items and the branch locations where items are sold.

The figure below shows a 3D data cube.



The figure below shows a 4D data cube.



The figure below shows a lattice of cuboids.


A comparison of OLTP and OLAP.



References
  1. FIT5158 Monash University Lecture Notes, 2012
  2. Andy Oppel (2011), Database Demystified, 2nd Ed, McGraw-Hill.

Monday, April 2, 2012

Artificial neural networks Day 05

For SOM training, the weight vector associated with each neuron moves to become the center of a cluster of input vectors. In addition, neurons that are adjacent to each other in the topology should also move close to each other in the input space, therefore it is possible to visualize a high-dimensional
inputs space in the two dimensions of the network topology.

The iris dataset has 4 input attributes. Therefore, There are four elements in each input vector, so the input space is four-dimensional. The weight vectors (cluster centers) fall within this space.

In the below figure, the blue hexagons represent the neurons. The red lines connect neighboring neurons. The colors in the regions containing the red lines indicate the distances between neurons. The darker colors represent larger distances, and the lighter colors represent smaller distances. A band of dark segments crosses from the lower-center region to the upper-right region. The SOM network appears to have clustered the flowers into two distinct groups.



ReferencesMATLAB 2011b Help Documentation

Sunday, April 1, 2012

Multimedia Day 03

Storytelling

There are six elements in storytelling:
  1. Character. The story is about characters. A story needs to tell the background of the characters, the emotional involvement (i.e. give viewers enough information about the characters so they can relate to them and become emotionally involved), the character arcs (i.e. the changes of the characters through the story). Good stories will show how a character changes throughout the presentation.
  2. Plot: is how you tell what happens (or what happened) in a story. It's not simply what happened in a story, but the arrangement of how the story is presented.
  3. Conflict: drives your characters to action. It's the problem the characters are trying to solve in a story.
  4. Resolution of your story is where you tell how the issues panned out. The resolution should tie up all the loose ends and bring the conflict to a successful close.
  5. Setting: is simply where the story takes place.
  6. Theme is the point you're trying to get across by presenting the story. It's the reason for the project. It may involve a moral imperative you're trying to teach the viewer. Ask yourself: why are we doing this project? What is the single conclusion the viewers should reach after experiencing the presentation. How should viewers act after experiencing this presentation?
Traditionally, there are three ACTS in any multimedia project (e.g. movie, tv show, a novel).
  1. Act 1: introduces the main characters, establish the situation and begins to establish the setting. Most of the time, Act 1 introduces the conflict the characters are experiencing.
  2. Act 2: if Act 1 introduce the conflict or the goal the hero must achieve, then Act II presents a series of obstacles that must be overcome.
  3. Act 3: bring the story to an end. It brings about the resolution to the conflict established earlier.
References
  1. Villalobas, R. (2008). Exploring multimedia for designers. Thomson Delmar Learning.

Multimedia Day 02

Animation

Slow in and slow out is a good way to approximate realistic movement in animation.

There are two ways of animating objects in Flash, by either using frame-by-frame, or cell-based animation, and tweened animation.

Text filled with images













References

  1. Villalobas, R. (2008). Exploring multimedia for designers. Thomson Delmar Learning.

Mounting USB drives in Windows Subsystem for Linux

Windows Subsystem for Linux can use (mount): SD card USB drives CD drives (CDFS) Network drives UNC paths Local storage / drives Drives form...