So i am working on an assignment and i am having some issues understanding arrays in this type of code (keep in mind that my knowledge of this stuff is limited). My code is supposed to ask the user to enter the number of values that that will be put in an array of SDWORD's and then create a procedure that has the user input the numbers. I have the part done below that asks the user for the amount (saved in "count") but i am struggling with the other procedure part For example with my code below if they enter 5 then the procedure that i have to make would require them to input 5 numbers that would go in to an array.
The problem I am facing is that i'm not sure how to actually set up the array. It can contain anywhere between 2 and twelve numbers which is why i have the compare set up in the code below. Let's say for example that the user inputs that they will enter 5 numbers and i set it up like this...
.data
array SDWORD 5
the problem i am having is that I'm not sure if that is saying the array will hold 5 values or if just one value in the array is 5. I need the amount of values in the array to be equal to "count". "count" as i have set up below is the amount that the user is going to enter.
Also i obviously know how to set up the procedure like this...
EnterValues PROC
return
EnterValues ENDP
I just don't know how to implement something like this. All of the research that i have done online is only confusing me more and none of the examples i have found ask the user to enter how many number will be i the array before they physically enter any numbers in to it. I hope what i described makes sense. Any input on what i could possibly do would be great!
INCLUDE Irvine32.inc
.data
count SDWORD ?
prompt1 BYTE "Enter the number of values to sort",0
prompt2 BYTE "Error. The number must be between 2 and 12",0
.code
Error PROC
mov edx, OFFSET prompt2
call WriteString
exit ; exit ends program after error occures
Error ENDP
main PROC
mov edx, OFFSET prompt1
call WriteString ; prints out prompt1
call ReadInt
mov count, eax ; save returned value from eax to count
cmp count, 12
jle Loop1 ; If count is less than or equal to 12 jump to Loop1, otherwise continue with Error procedure
call Error ; performs Error procedure which will end the program
Loop1: cmp count, 2
jge Loop2 ; If count is greater than or equal to 2 jump to Loop2, otherwise continue with Error procedure
call Error ; performs Error procedure which will end the program
Loop2: exit
main ENDP
END main
============EDIT==============
I came up with this...
EnterValues PROC
mov ecx, count
mov edx, 0
Loop3:
mov eax, ArrayOfInputs[edx * 4]
call WriteInt
call CrLf
inc edx
dec ecx
jnz Loop3
ret
EnterValues ENDP
.data
array SDWORD 5
defines one SDWORD with the initial value 5 in the DATA section and gives it the name "array".
You might want to use the DUP operator
.data
array SDWORD 12 DUP (5)
This defines twelve SDWORD and initializes each of them with the value 5. If the initial value doesn't matter, i.e. you want an uninitialized array change the initial value to '?':
array SDWORD 12 DUP (?)
MASM may now create a _BSS segment. To force the decision:
.data?
array SDWORD 12 DUP (?)
The symbol array is used in a MASM program as a constant offset to the address of the first entry. Use an additional index to address subsequent entries, for example:
mov eax, [array + 4] ; second SDWORD
mov eax, [array + esi]
Pointer arithmetic:
lea esi, array ; copy address into register
add esi, 8 ; move pointer to the third entry
mov eax, [esi] ; load eax with the third entry
lea esi, array + 12 ; copy the address of the fourth entry
mov eax, [esi] ; load eax with the fourth entry
You've got in every case an array with a fixed size. It's on you, just to fill it with count values.
Related
Is there a general strategy to create an efficient bit permutation algorithm. The goal is to create a fast branch-less and if possible LUT-less algorithm. I'll give an example:
A 13 bit code is to be transformed into another 13 bit code according to the following rule table:
BIT
INPUT (DEC)
INPUT (BIN)
OUTPUT (BIN)
OUTPUT (DEC)
0
1
0000000000001
0000100000000
256
1
2
0000000000010
0010000000000
1024
2
4
0000000000100
0100000000000
2048
3
8
0000000001000
1000000000000
4096
4
16
0000000010000
0000001000000
64
5
32
0000000100000
0000000100000
32
6
64
0000001000000
0001000000000
512
7
128
0000010000000
0000000010000
16
8
256
0000100000000
0000000001000
8
9
512
0001000000000
0000000000010
2
10
1024
0010000000000
0000000000001
1
11
2048
0100000000000
0000000000100
4
12
4096
1000000000000
0000010000000
128
Example: If the input code is 1+2+4096=4099 the resulting output would be 256+1024+128=1408
A naive approach would be:
OUTPUT = ((INPUT AND 0000000000001) << 8) OR ((INPUT AND 0000000000010) << 9) OR ((INPUT AND 0000000000100) << 9) OR ((INPUT AND 0000000001000) << 9) OR ...
It means we have 3 instructions per bit (AND, SHIFT, OR) = 39-1 (last OR omitted) instructions for the above example. Instead we could also use a combination of left and right shifts to potentially reduce code size (depends on target platform), but this will not decrease the amount of instructions.
When inspecting the example table, you will of course notice a few obvious possibilities for optimization, for example in line 2/3/4 which can be combined as ((INPUT AND 0000000000111) << 9). But beside that it is becoming a difficult tedious task.
Are the general strategies? I think using Karnaugh-Veitch Map's to simplify the expression could be one approach? However it is pretty difficult for 13 input variables. Also the resulting expression would only be a combination of OR's and AND's.
For bit permutations, several strategies are known that work in certain cases. There's a code generator at https://programming.sirrida.de/calcperm.php which implements most of them. However, in this case, it seems to find only basically the strategy you suggested, indicating that it seems hard to find any pattern to exploit in this permutation.
If one big lookup table is too much, you can try to use two smaller ones.
Take 7 lower bits of the input, look up a 16-bit value in table A.
Take 6 higher bits of the input, look up a 16-bit value in table B.
or the values from 1. and 2. to produce the result.
Table A needs 128*2 bytes, table B needs 64*2 bytes, that's 384 bytes for the lookup tables.
This is a hand-optimised multiple LUT solution, which doesn't really prove anything other than that I had some time to burn.
Multiple small lookup tables can occasionally save time and/or space, but I don't know of a strategy to find the optimal combination. In this case, the best division seems to be three LUTs of three bits each (bits 4-6, 7-9 and 10-12), totalling 24 bytes (each table has 8 one-byte entries), plus a simple shift to cover bits through 3, and another simple shift for the remaining bit 0. Bit 5, which is untransformed, was also a tempting target but I don't see a good way to divide bit ranges around it.
The three look-up tables have single-byte entries because the range of the transformations for each range is just one byte. In fact, the transformations for two of the bit ranges fit entirely in the low-order byte, avoiding a shift.
Here's the code:
unsigned short permute_bits(unsigned short x) {
#define LUT3(BIT0, BIT1, BIT2) \
{ 0, (BIT0), (BIT1), (BIT1)+(BIT0), \
(BIT2), (BIT2)+(BIT0), (BIT2)+(BIT1), (BIT2)+(BIT1)+(BIT0)}
static const unsigned char t4[] = LUT3(1<<(6-3), 1<<(5-3), 1<<(9-3));
static const unsigned char t7[] = LUT3(1<<4, 1<<3, 1<<1);
static const unsigned char t10[] = LUT3(1<<0, 1<<2, 1<<7);
#undef LUT3
return ( (x&1) << 8) // Bit 0
+ ( (x&14) << 9) // Bits 1-3, simple shift
+ (t4[(x>>4)&7] << 3) // Bits 4-6, see below
+ (t7[(x>>7)&7] ) // Bits 7-9, three-bit lookup for LOB
+ (t10[(x>>10)&7] ); // Bits 10-12, ditto
}
Note on bits 4-6
Bit 6 is transformed to position 9, which is outside of the low-order byte. However, bits 4 and 5 are moved to positions 6 and 5, respectively, and the total range of the transformed bits is only 5 bit positions. Several different final shifts are possible, but keeping the shift relatively small provides a tiny improvement on x86 architecture, because it allows the use of LEA to do a simultaneous shift and add. (See the second last instruction in the assembly below.)
The intermediate results are added instead of using boolean OR for the same reason. Since the sets of bits in each intermediate result are disjoint, ADD and OR have the same result; using add can take advantage of chip features like LEA.
Here's the compilation of that function, taken from http://gcc.godbolt.org using gcc 12.1 with -O3:
permute_bits(unsigned short):
mov edx, edi
mov ecx, edi
movzx eax, di
shr di, 4
shr dx, 7
shr cx, 10
and edi, 7
and edx, 7
and ecx, 7
movzx ecx, BYTE PTR permute_bits(unsigned short)::t10[rcx]
movzx edx, BYTE PTR permute_bits(unsigned short)::t7[rdx]
add edx, ecx
mov ecx, eax
sal eax, 9
sal ecx, 8
and ax, 7168
and cx, 256
or eax, ecx
add edx, eax
movzx eax, BYTE PTR permute_bits(unsigned short)::t4[rdi]
lea eax, [rdx+rax*8]
ret
I left out the lookup tables themselves because the assembly produced by GCC isn't very helpful.
I don't know if this is any quicker than #trincot's solution (in a comment); a quick benchmark was inconclusive, but it looked to be a few percent quicker. But it's quite a bit shorter, possibly enough to compensate for the 24 bytes of lookup data.
I want to write a simple assembly language program to sort student names according to their grades.
I am just using:
.data
.code
I try this bubble sort but this one is only for numbers. How can I add names for the students?
.data
array db 9,6,5,4,3,2,1
count dw 7
.code
mov cx,count
dec cx
nextscan:
mov bx,cx
mov si,0
nextcomp:
mov al,array[si]
mov dl,array[si+1]
cmp al,dl
jnc noswap
mov array[si],dl
mov array[si+1],al
noswap:
inc si
dec bx
jnz nextcomp
loop nextscan
Long ago, one of the most common way to represent data was with what was called fixed length fields. It wasn't uncommon to find all related data in one place like this;
Student: db 72, 'Marie '
db 91, 'Barry '
db 83, 'Constantine '
db 59, 'Wil-Alexander '
db 97, 'Jake '
db 89, 'Ceciel '
This is doable, as each of the fields is 16 bytes long and that is the way data used to be constructed in multiples of 2. So the data length was either 2, 4, 8, 16, 32, 64 and so on. Didn't have to be this way and a lot of times it wasn't, but multiples like that made the code simpler.
Problem is, each time we want to sort, all data has to be moved, so the relational database was born. Here we separate variable data from static.
Student: db 'Marie '
db 'Barry '
db 'Constantine '
db 'Wil-Alexander '
db 'Jake '
db 'Ceciel '
Grades: db 72, 0
db 91, 1
db 83, 2
db 59, 3
db 97, 4
db 89, 5
dw -1 ; Marks end of list
Not only will this be easier to manage in the program, but to add more grades and even grades for the same person is easier. Here is an example of how code would work to do comparisons.
mov si, Grades
mov bl, 0
push si
L0: lodsw
cmp ax, -1
jz .done
cmp [si-4], al
jae L0
.... Exchange Data Here ....
bts bx, 0
jmp L0
.done:
pop si
btc bx, 0
jc L0 - 1
ret
After routine has been executed the contents of grades is as follows;
61 04 5B 01 59 05 53 02 48 00 3B 00
I do have a working copy of this program tested in DOSBOX and because this is a homework assignment, I'm not going to hand it to you on a silver platter, but 95% of the work is done. All you need to do before handing in is make sure you can explain why BTS & BTC makes the bubble work and implement something that will exchange data.
If you needed to display this data, you'd need to device a conversion routine from binary -> decimal, but by simply multiplying the index number by 16 associated with each grade and adding the address of Student to it, that would give you a pointer to the appropriate name.
Sort pointers to name, grade structs, or indices into separate name and grade arrays.
That's one extra level of indirection in the compare, but not in the swap.
I have to create a loop that will have the end result that prints out "LoopLoopLoopLoopLoop" (note only 5 times). What I have so far is this..
.data
STRING_1 DB "Loop"
.code
mov ecx, 5 ; Perform loop 5 times
printLoop:
LEA DX, STRING_1
loop printLoop
call DumpRegs
I'm not even 100% sure if what I have is even correct but I suppose the main question I have is how do I make it so that all of the outcomes are printed on the same line?
I m new to assembly language and learning it for exams. I ama programmer and worked in C,C++, java, asp.net.
I have tasm with win xp.
I want to know How data is stored in memory or register. I want to know the process. I believe it is something like this:
While Entering data, eg. number:
Input Decimal No. -> Converted to Hex -> Store ASCII of hex in registers or memory.
While Fetching data:
ASCII of hex in registers or memory -> Converted to Hex -> Show Decimal No. on monitor.
Is it correct. ? If not, can anybody tell me with simple e.g
Ok, Michael: See the code below where I am trying to add two 1 digit numbers to display 2 digit result, like 6+5=11
Sseg segment stack
ends
code segment
;30h to 39h represent numbers 0-9
MOV BX, '6' ; ASCII CODE OF 6 IS STORED IN BX, equal to 36h
ADD BX, '5' ; ASCII CODE OF 5 (equal to 35h) IS ADDED IN BX, i.e total is 71h
Thanks Michael... I accept my mistake....
Ok, so here, BX=0071h, right ? Does it mean, BL=00 and BH=71 ?
However, If i do so, I can't find out how to show the result 11 ?
Hey Blechdose,
Can you help me in one more problem. I am trying to compare 2 values. If both are same then dl=1 otherwise dl=0. But in the following code it displays 0 for same values, it is showing me 0. Why is it not jumping ?
sseg segment stack
ends
code segment
assume cs:code
mov dl,0
mov ax,5
mov bx,5
cmp ax,bx
jne NotEqual
je equal
NotEqual:
mov dl,0
add dl,30h
mov ah,02h
int 21h
mov ax,4c00h
int 21h
equal: mov dl,1
add dl,30h
mov ah,02h
int 21h
mov ax,4c00h
int 21h
code ends
end NotEqual
end equal
Registers consist of bits. A bit can have the logic value 0 or 1. It is a "logic value" for us, but actually it is represented by some kind of voltage inside the hardware. For example 4-5 Volt is interpreted as "logic 1" and 0-1 Volt as "logic 0". The BX register has 16 of those bits.
Lets say the current content of BX(Base address register) is: 0000000000110110. Because it is very hard to read those long lines of 0s and 1s for humans, we combine every 4 bits to 1 Hexnumber, to get a more readable format to work with. The CPU does not know what a Hex or decimal number is. It can only work with binary code. Okay, let us use a more readable format for our BX register:
0000 0000 0011 0110 (actual BX content)
0 0 3 6 (HEX format for us)
54 (or corresponding decimal value)
When you send this value (36h), to your output terminal, it will interpret this value as an ASCII-charakter. Thus it will display a "6" for the 36h value.
When you want to add 6 + 2 with assembly, you put 0110 (6) and 0010 (2) in the registers. Your assembler TASM is doing the work for you. It allows you to write '6' (ASCII) or 0x6 (hex) or even 6 (decimal) in the asm-sourcecode and will convert that for you into a binary number, which the register accepts. WARNING: '6' will not put the value 6 into the register, but the ASCII-Code for 6. You cannot calculate with that directly.
Example: 6+2=8
mov BX, 6h ; We put 0110 (6) into BX. (actually 0000 0000 0000 0110,
; because BX is 16 Bit, but I will drop those leading 0s)
add BX, 2h ; we add 0010 (2) to 0110 (6). The result 1000 (8) is stored in BX.
add BX, 30h ; we add 00110000 (30h). The result 00111000 (38h) is stored in BX.
; 38h is the ASCII-code, which your terminal output will interpret as '8'
When you do a calculation like 6+5 = 11, it will be even more complicated, because you have to convert the result 1011 (11) into 2 ASCII-Digits '1' and '1' (3131h = 00110001 00110001)
After adding 6 (0110) + 5 (0101) = 11 (1011), BX will contain this (without blanks):
0000 0000 0000 1011 (binary)
0 0 0 B (Hex)
11 (decimal)
|__________________|
BX
|________||________|
BH BL
BH is the higher Byte of BX, while BL is the lower byte of BX. In our example BH is 00h, while BL contains 0bh.
To display your summation result on your terminal output, you need to convert it to ASCII-Code. In this case, you want to display an '11'. Thus you need two times a '1'-ASCII-Character. By looking up one of the hunderds ASCII-tables on the internet, you will find out, that the Code for the '1'-ASCII-Charakter is 31h. Consequently you need to send 3131h to your terminal:
0011 0001 0011 0001 (binary)
3 1 3 1 (hex)
12593 (decimal)
The trick to do this, is by dividing your 11 (1011) by 10 with the div instruction. After the division by 10 you get a result and a remainder. you need to convert the remainder into an ASCII-number, which you need to save into a buffer. Then you repeat the process by dividing the result from the last step by 10 again. You need to do this, until the result is 0. (using the div operation is a bit tricky. You have to look that up by yourself)
binary (decimal):
divide 1011 (11) by 1010 (10):
result: 0001 (1) remainder: 0001 (1) -> convert remainderto ASCII
divide result by 1010 (10) again:
result: 0000 (1) remainder: 0001 (1) -> convert remainderto ASCII
gcc compile binary has following assembly:
8049264: 8d 44 24 3e lea 0x3e(%esp),%eax
8049268: 89 c2 mov %eax,%edx
804926a: bb ff 00 00 00 mov $0xff,%ebx
804926f: b8 00 00 00 00 mov $0x0,%eax
8049274: 89 d1 mov %edx,%ecx
8049276: 83 e1 02 and $0x2,%ecx
8049279: 85 c9 test %ecx,%ecx
804927b: 74 09 je 0x8049286
At first glance, I had no idea what it is doing at all. My best guess is some sort of memory alignment and clearing up local variable (because rep stos is filling 0 at local variable location). If you take a look at first few lines, load address into eax and move to ecx and test if it is even address or not, but I'm lost why this is happening. I want to know what exactly happen in here.
It looks like initialising a local variable located at [ESP + 0x03e] to zeroes. At first, EDX is initialised to hold the address and EBX is initialised to hold the size in bytes. Then, it's checked whether EDX & 2 is nonzero; in other words, whether EDX as a pointer is wyde-aligned but not tetra-aligned. (Assuming ESP is tetrabyte aligned, as it generally should, EDX, which was initialised at 0x3E bytes above ESP, would not be tetrabyte aligned. But this is slightly besides the point.) If this is the case, the wyde from AX, which is zero, is stored at [EDX], EDX is incremented by two, and the counter EBX is decremented by two. Now, assuming ESP was at least wyde-aligned, EDX is guaranteed to be tetra-aligned. ECX is calculated to hold the number of tetrabytes remaining by shifting EBX right two bits, EDI is loaded from EDX, and the REP STOS stores that many zero tetrabytes at [EDI], incrementing EDI in the process. Then, EDX is loaded from EDI to get the pointer-past-space-initialised-so-far. Finally, if there were at least two bytes remaining uninitialised, a zero wyde is stored at [EDX] and EDX is incremented by two, and if there was at least one byte remaining uninitialised, a zero byte is stored at [EDX] and EDX is incremented by one. The point of this extra complexity is apparently to store most of the zeroes as four-byte values rather than single-byte values, which may, under certain circumstances and in certain CPU architectures, be slightly faster.